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GENERAL EMPIRICAL BAYES WAVELET METHODS AND 
EXACTLY ADAPTIVE MINIMAX ESTIMATION 1 

By Cun-Hui Zhang 

Rutgers University 

In many statistical problems, stochastic signals can be repre- 
sented as a sequence of noisy wavelet coefficients. In this paper, we 
develop general empirical Bayes methods for the estimation of true 
signal. Our estimators approximate certain oracle separable rules and 
achieve adaptation to ideal risks and exact minimax risks in broad 
collections of classes of signals. In particular, our estimators are uni- 
formly adaptive to the minimum risk of separable estimators and the 
exact minimax risks simultaneously in Besov balls of all smoothness 
and shape indices, and they are uniformly superefficient in conver- 
gence rates in all compact sets in Besov spaces with a finite secondary 
shape parameter. Furthermore, in classes nested between Besov balls 
of the same smoothness index, our estimators dominate threshold 
and James-Stein estimators within an infinitesimal fraction of the 
minimax risks. More general block empirical Bayes estimators are 
developed. Both white noise with drift and nonparametric regression 
are considered. 



1. Introduction. Suppose a sequence y = {yjk} of infinite length is ob- 
served, with 

(1.1) y jk = f3jk + £Zjk, 1 < k < max(2 J ', 1), j = -1,0, 1, . . . , 

where e > and Zjk are i.i.d. iV(0, 1). In many statistical problems, stochastic 
signals can be represented in the form of (1.1) as noisy wavelet coefficients 
with errors £Zjj-, or simply represented by a sequence of normal variables 
as in (2.2) below. In this paper we consider estimation of the true wavelet 
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coefficients (5 = {(3jk}, that is, the normal means, with the £2 risk 

(1.2) R {£) 0,(3) = £ £ 4 £) (/3 jfc - /3 jfc ) 2 

i=-i fc=i 

for estimates (3 = {fijk} based on y, where is the expectation in model 
(!•!)• 

We develop general empirical Bayes (GEB) estimators f3^ = ^ £ \y), de- 
fined in Section 2, such that under certain mild conditions on the sequence 
(5 the risks of f3^ satisfy 

(1.3) R( £ \ft £ \f3)^R( £ '*\(3)= inf R ie) 0,(3), 

where T> s is the class of all separable estimators of the form (3jk = hj(yjk) 
with Borel hj. We provide oracle inequalities, that is, upper bounds for the 
regret R^ £ \^ £ \l3) - R^ £ '*\l3) for this adaptation to the ideal risk R^ £ '*\(3). 
Our oracle inequalities imply that the ideal adaptation (1.3) is uniform for 
large collections B of classes B of the unknown /3, for example, Lipschitz, 
Sobolev and Besov balls B of all smoothness and shape indices and radii, 
in the sense that for all B £ B the regret is uniformly of smaller order than 
the minimax risk 

(1.4) ^ £ )( J B) = infsup J R (e) (/3,/?). 

This uniform ideal adaptation implies: (1) the exact minimax adaptation 

(1.5) sup {R^ {(3 {e) ,P):PeB} = (l + o(l))H^ (B) 

simultaneously for all Besov balls B, (2) adaptation to spatial inhomogene- 
ity of the signal [Donoho and Johnstone (1994a)], (3) the superefficiency of 
the GEB estimators in convergence rates in all compact sets of f3 in Besov 
spaces with a finite secondary shape parameter and (4) dominance of GEB 
estimators over other empirical Bayes (EB) or separable estimators in the 
limit in all classes of (3 nested between Besov balls of the same smoothness 
index. We also describe more general block EB methods and implementa- 
tion of GEB estimators in nonparametric regression models with possibly 
unknown variance. 

The white noise model (1.1) is a wavelet representation of its original form 
[cf. Ibragimov and Khas'minskii (1981)], in which one observes 

(1.6) Y(t)= T f(u)du + sW(t), 0<t<l, 

Jo 
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where / € L 2 [0, 1] is unknown and W(-) is a standard Brownian motion. In 
this representation, y jk = j 4> jk (t) dY(t), (3 jk = (3 jk (f) = J f(t)<j) jk (t) dt, and 
estimates 

(1-7) f{t) = Y,hk<t>jk{t\ 0< £< 1, 

are constructed based on estimates (3j k of (3j k , where cpj k are wavelets forming 

an orthonormal basis in L 2 [0, 1]. Let £y be the expectation in model (1.6). 
By the Parseval identity, 

(1.8) (3) = Ef ^0 jk - (3, k ) 2 = Ef f\f(t) - f(t)} 2 dt, 

j,k Jo 

so that our problem is equivalent to the estimation of / under the mean inte- 
grated squared error (MISE). In general, 4>jk(t) are "periodic" or "boundary 
adjusted" dilation and translation 2^ 2 (p(2H — k) of a "mother wavelet" <fi 
of compact support for j > jo for certain jo > 0; see Donoho and Johnstone 
(1994a). Since <j)j k is supported in an interval of size 0(1/2 J ) in the vicinity 
of k/2 3 , j and k are, respectively, resolution and spatial indices, and yj k rep- 
resent the information about the behavior of / at resolution level j and loca- 
tion k/2 J . We refer to Chui (1992), Daubechies (1992) and Hardle, Kerkyacharian, Picard and Tsybako\ 
(1998) for wavelet theory and its applications. 

This paper is organized as follows. We develop block EB methods in Sec- 
tion 2 which naturally lead to GEB estimators. We state main properties 
of the GEB estimators in Section 3. We implement GEB estimators in non- 
parametric regression models in Section 4. We discuss related results and 
problems in Section 5. We focus on compound estimation of normal means 
in Section 6. We present our main theorems in their full strength in Section 7. 
We cover Bayes models and more general classes of the unknown (3 in Sec- 
tion 8. We study the equivalence between the nonparametric regression and 
white noise models in Section 9. Proofs are given in the Appendix unless oth- 
erwise stated or provided immediately after the statements of results. The 
main theorems in Sections 3, 6 and 7 have been reported earlier in Zhang 
(2000) with more details in proofs. We use the notation \og + x = 1 V logx 
and xi n ) = (xi, . ■ ■ , x n ) throughout. 

2. Block EB methods and GEB estimators. We begin with block EB 
methodologies, which naturally lead to GEB estimators. Consider a sequence 
of iV < oo decision problems with observations X k ~ p(x\9 k ) and parameters 
6 k under the compound risk J2k=i ELo(5 k ,0 k ) for a given loss Lo(-, •). Block 
EB methods partition the sequence into blocks [j] = (kj—i,kj], kj-i < kj < 
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oo, and apply EB procedures of the form 5k = tyj(X k ), k € in individual 
blocks, where tyj(-) are estimates of the oracle rules 

(2.1) tU-) = axg min £ E $k Lo(t(X k ),O k ) 

for a certain class T>q of decision rules. Block GEB (linear, threshold EB) 
procedures approximate the oracle rules (2.1) corresponding to the classes 
T>o of all Borel (linear, threshold) functions. It follows from compound de- 
cision theory [Robbins (1951)] that tf-i are the Bayes rules when the priors 
are taken to be the unknown empirical distributions of {9 k ,k € [j]}. 

Consider the estimation of normal means (3 k based on independent obser- 
vations 

(2.2) y = {y k ,k<N}, y k ~N((3 k ,e 2 ) 

with known e, under the squared loss as in (1.2). After standardization 
with (X k ,9 k ) = (y k ,p k )/e to the unit variance, block GEB estimators of f3 k 
become 

(2.3) ^=^\y) = et m {y k /e), ke[j], 

where fyi are estimates of (2.1) with squared-error loss Lq(5,9) = (5 — 9) 2 . 
The empirical distribution of {9 k , k G [j]} is 

(2.4) G V ]{u ) = ±J2m<u>h 0k=~, 

where nj = kj — kj—i is the size of block j. Let <p(x) = e~ x2 / 2 /v / 2vr and 

(2.5) ipc(x) = (p{x;G) = [ f(x — u)dG(u), ip'(x;G)~ 



dx 

The oracle rules (2.1) are explicit functionals of the mixture marginal distri- 
butions ip(x; Gyj) of the observations {Xk = yk/z, k G [j]} [Robbins (1956), 
page 162, and Stein (1981)], given by 

(2.6) t> {x) = x + W . 

This formula motivated the GEB estimators of Zhang (1997). 

We construct GEB estimators in individual blocks using a hybrid version 
of the GEB estimator of Zhang (1997). The hybrid GEB estimator utilizes 

an estimate of the order of k(G^), 

(2.7) k(G) = J (\u\ 2 M)dG{u), 
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and switches from the GEB estimator to a threshold estimator when k(G^J) 
is small. Specifically, for certain p(n) > and b(n) given in (2.11) below and 
rtj > n* > 2, we define (2.3) by 



(2.8) t m (x) 



x + <p'\j}(x)J max.{p(nj),tptf](x)}, if k^ > b(nj), 
sgn(x)(\x\ - \/2 log rij ) + , if k^ < b(rij), 



where cp^(x), a kernel estimate of ip(x;Gy)) in (2.6), is given by 
(2.9) lj] (x) = — V21ogn J K( v / 21ogn J (x-X fc )) 



^ km 



with -f^(x) = sin(x)/(7rx) and X k = yi-/e, and kyu an estimate of the order 
of K ( G \j])i is g iven b y 

y/2 



nj km 

For rij < n*, we choose the MLE = y k [i.e., t[j](x) = x or p(n) = oo = 
—b{n) for n < n*] or the James and Stein (1961) estimator for the vectors 

{Pk,ke\j}}- 

For denoising the wavelet coefficients, we identify the sequence y = {y-i,i, yo,ij 2/l,i:2/i,2> • • • } 
in (1.1) with y = {y k , k < N} in (2.2) and partition them into natural blocks 
[j] = (2 J ', 2 J+1 ], j = — 1,0,..., with a single block for each resolution level j. 
This results in 



(2.10) «.{* te/e) . 

where = max{j : j < (log n*)/ log 2} and tyj is as in (2.8) with iij = 2 3 , 

- / v _ \/2jlog2 * f x-y jk /e \ sin(x) 
*W (X) = g l (2jlog2)-i/2 j ' = ' 

and Kfj-i = 1 — 2~ J Sfe=i V^2 exp(— (yjk/e) 2 /2). For definiteness, we set 

(2.11) p = p(n) = (1 + r? n )po V2(log n)/n, b = b(n) = b (logn)/y/n, 

with certain r\ n — > and positive constants po and bo. We simply call (2.10) 
GEB estimators since the blocks represent natural resolution levels in the 
wavelet setting. 

We discuss in detail in Section 6 the construction and properties of the 
GEB estimators in individual blocks (resolution levels). Here we briefly de- 
scribe the rationale for our choices of the "tuning parameters" for (2.8) 
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and (2.10). The special kernel and bandwidth in (2.9) ensure that ipy^(x) — > 
ip(x;G^) at nearly the optimal rate n- 1//2 uniformly and in derivatives as 
rij — > oo [Zhang (1997)]. The sample size n* (and thus the initial resolution 
level j*) should be determined so that 0\j](x) ip(x;G^) ) with sufficient 
accuracy for rij > n*. Although the p(rij) and b(rij) in (2.8) could be de- 
termined/optimized by data-driven methods, for example, Stein's (1981) 
estimator of mean squared error, bootstrap and cross validation, properties 
of the resulting estimators are not clear. The choice in (2.11) provides the 
sharpest bounds in our main theorems. Our risk bounds depend on po 
and 6o om y through scaling constants in terms of smaller order than mini- 
max risks. Finally, we remark that (block) GEB estimators (2.3) and (2.10) 
are scale equivariant: 

(2.12) j3 i - e \y) = C^ £ l c \y/C) VC>0, 

since tu](x) in (2.8) depend on y and e only through y/e. Thus, for the risks 
in (1.3) and all C > 

r& {£) (y) , P) = c 2 i? (£/c) {£,c) (y/c),p/c), 

(2 13) 

R( e >*)(p) =C 2 R {£/C >* ) {(3/C). 



3. Oracle inequalities and their consequences. In this section we describe 
main properties of our (block) GEB estimators (2.3), (2.8) and (2.10) and the 
concepts of uniform ideal adaptivity, exactly adaptive minimaxity, spatial 
adaptivity and superefficiency. Sections 5, 7 and 8 contain further discussion 
about these properties and concepts. 



3.1. Oracle inequalities. Consider the estimation of normal means with 
observations (2.2). An oracle expert with, the knowledge of t?.j in (2.6) could 
use the ideal separable rule etfy(yk/s) for (3^ to achieve the ideal risk 

(3.1) R^*\f3) = min J R( £ )(/3,/3) = ^min ]T Ef{et{y k /e) - fa} 2 , 

as in (1.3), where T> s is the collection of all separable estimates of the form 
fa = hj(yk), V/c £ [j]. Although et*-^{yjk/e) are not statistics, the ideal risk 
(3.1) provides a benchmark for our problem. 

Theorem 3.1. Let = {f3^ } } be as in (2.3) and (2.8) based on (2.2). 
Let R( £ \p,f3) = J2k^=i^js\Pk~Pk) 2 - Then there exists a universal constant 
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M < oo such that 

R {£) (ft £ \(3)- R (£ '*\P) 

(3-2) 

< Me £ [n jrpA2 (n„ J + j . 

where -R( £ '*)(/3) is i/ie idea/ risk in (3.1), = kj — kj-i are block sizes and 
(33)r (n C) = mm U C " max ( COw^ Y'™]) 

Corollary 3.1. If = 0, then R( £ ^ ((3) = and (/3( £ ) , /J) = <3(e 2 ) . 

Theorem 3.1, proved in Section 7.1, provides a crucial oracle inequality 
in the derivation of our main results. It allows us to bound the regret of 
our estimators in terms of the moments of /3m . Consider block sizes rij such 
that, for all p > and rj > and as x — > oo, 

tP rip- 
(3.4) £_ = (^), ]T^ = o(^), ^(l + log % )- 3 / 2 <oo. 

rij>x J rij<x j 

Condition (3.4) holds if logn.,- ~ p for certain 2/3 < 7 < 1. 

Theorem 3.2. Let (3^ be as in Theorem 3.1 and ||/3|| ^sup^nj x 

(Z)fce[j] lAsPY 7l j) 1 ^'- Suppose (3.4) ZioZds and ao = min{s,2s — 1/2 — l/(p A 
2)} > 0. T/ien, /or all n > 

sup{i? (e) (/3 (e) ,/3) - i#'*)(/3) : ||/3|| < C] < ( e *xo/(ao+i/2)-ri} ase^0+. 



Remark. If a higher threshold level a/2(1 + A )logn.,- with yl > is 
used in (2.8) for km < b(nj), Theorems 3.1 and 3.2 hold with (l + logn.,)~ 3 / 2 

replaced by n~ A °(l + lognj) -3 / 2 in (3.2) and (3.4). See the remark below 
Theorem 6.4. 

In the rest of Section 3, we focus on the wavelet model (1.1), that is, the 
case of nj = 2 3 . Our methodology is clearly applicable to more general block 
sizes nj satisfying (3.4). 

3.2. Uniform ideal adaptation. Let R( £ \f3,f3) be the £2 risk in (1.2). 
Statistical estimators f3^ are uniformly adaptive to the ideal risk R( £ '*\(3) 
in (1.3) and (3.1), with respect to a collection B of classes B of the unknown 
sequence /3, if 

(3.5) snp{R ( - £ \(3^,P)-R^*\p)} = o(l)TZ^{B) ase^O+VBGB, 

(3eB 
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and depends on (y,e) only, not on B, where TZ^ £ \B) is the minimax 
risk in (1.4). In other words, uniform ideal adaptation demands that, for all 
B E B and in the minimax sense, the regret 

(3.6) r ( £ ) {£) , P) = R {£) {£) > P) ~ ^ (e, * } (P) 

be uniformly of smaller order than the typical convergence rates in B. As 
an immediate consequence of uniform ideal adaptation, maximum risks are 
bounded by the maximum ideal risks, 

(3.7) supi? (e) (/3 (e) ,/3) < (l + o(l))sup J R (e '* ) (/3) VBgB. 

0GB 0GB 

Our GEB estimators possess this uniform ideal adaptivity property with 
respect to 



6 B csov = |-Bp, 9 (C) : < a < oo, 

(3.8) 



<p<oo,0<g<oo,0<C<ook 



a +1/2 

where B°g = B£ q (C) are the Besov balls defined by 
B^, q = {p:\\P\\p, q <C}, 

(3.9) 

- l/q 



\\P\\ a 



p,q 



i/3_ 1 , r+E(2 j(a+1/2 - 1/p) ii^]ii P , 2J r 

3=0 



with \\P\j] ||p 2? = (J2k=l \Pjk\ p ) 1 -i and with the usual modifications for p V 
q = oo. ForpAq < 1, || ■ ||£ g is not a norm, but (||/3 / + /0"|| p l ig ) pA9 < (||/?'||p i(? ) pA9 + 
lp,g) pA9 i s sufficient here. 



Theorem 3.3. Let (i® = 0$} be as in (2.10) based on y = {y jk } 
in (1.1), with p(n) and b(n) in (2.11). Then (3.5) holds for B = B-Q esON • 

By Donoho and Johnstone [(1998), Theorem 1] and Theorem 7.3 below, 
the minimax convergence rates in Besov balls are given by 

K {£) {B« q {C)) U^{B^ q {C)) 
(3.10) < o m| c £2a/(a+1/2)cl/{a+1/2) < Q sup^ £2a/(a+1/2)cl/{a+1/2) < °o. 

Based on (3.10), Theorem 3.3 is an immediate consequence of Theorems 
3.2 and 7.2 in Section 7, which provide upper bounds for the convergence 
rates of the o(l) in (3.5). Note that ao > a = s — 1/2 in Theorem 3.2 for 
s > 1/p. We show in Section 8 that (3.5) and (1.5) hold for much larger 
collections than 0Besov 
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3.3. Adaptive minimaxity. A main consequence of the uniform ideal adap- 
tivity in Theorem 3.3 is the universal exactly adaptive minimaxity over all 
Besov balls. 

Theorem 3.4. Let = 0$} be as in (2.10) and (2.11) with positive 
constants (j*,po,bo). Then (1.5) holds for the Besov balls Bp q (C) in (3.9) 
for all (a,p,q,C) in (3.8). 

This result can be viewed as an extension of the work of Efromovich and Pinsker 
(1984, 1986), Efromovich (1985) and Golubev (1992) from Sobolev versions 
°f B tt ,2 2 t° Besov balls with general (a,p,q). Theorem 3.4 follows from The- 
orem 7.4 in Section 7, which provides upper bounds for the order of the o(l) 
in (1.5). 

For a general collection B, exact adaptive minimaxity (1.5) is a conse- 
quence of (3.5) and 

(3.11) sup R( £ '*\[3) = (1 + o(l))K®(B), 

since (3.5) implies (3.7). For Besov balls B = Bp q , (3.11) is proved in Donoho and Johnstone 
(1998) for q > p and in Theorem 7.3 for general (p, q). 

3.4. Spatial adaptation. Another main consequence of the uniform ideal 
adaptivity in Theorem 3.3 is spatial adaptivity of (1.7) when (5 = 0(f) repre- 
sents wavelet coefficients of a spatially inhomogeneous signal function /(•). 
For (3 £ Bp q , the smoothness index a indicates the typical rate of decay of 

\/3jk\ as j — > oo. Donoho and Johnstone (1994a) and Donoho, Johnstone, Kerkyacharian and Picard 
(1995) pointed out that spatial inhomogeneity of a function / is often re- 
flected in the sparsity of its wavelet coefficients (5jk = Pjk(f) a ^ individual 
resolution levels, not necessarily in the smoothness index a. In such cases, 
a handful of \/3jk\ could be much larger than the overall order of magni- 
tude of at individual resolution levels, so that j3 £ B" only for small 
p < 2. Thus, spatial adaptation can be achieved via (exactly, rate or nearly) 
adaptive minimaxity in Besov balls with small shape parameter p. Our GEB 
estimators are spatially adaptive to the full extent in the sense that they are 
exactly adaptive minimax in Besov balls for all (a,p, q), under the minimum 
condition p > l/(a + 1/2), even allowing p < 1. 

Example 3.1. Let Fd,m(C) be the collection of all piecewise polyno- 
mials / of degree d in [0,1], with at most m pieces and ||/||oo < C. Let 
<f> be a mother wavelet with J xi<f)(x)dx = 0, j = 0, ...,d, and 4>(x) = 
outside an interval Iq of length |io|- For / E J~d,m(C), the wavelet coef- 
ficients j3jk = fijk(f) = 2 J ^ 2 /o f(x)4>(2^x — k) dx = if / is a single piece 
of polynomial in (J + k)/V and \j3 jh \ < 2~^ 2 C f |0| dx other wise. Thus, 
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ll/3[j]ll P ,2J < 2-J'/ 2 m 1 /PCM for all j and p, where M = (|/ | + 2) 1 /*/ |<£| dx. 
By (3.9), ||/9||p )9 < oo if a < 1/p for g < oo or a = 1/p for g = oo. Theo- 
rem 3.4, (3.10) and (1.8) imply that Ef /*(/ - /) 2 = ,/?(/)) = 
(9^ e 2a /( Q + 1 / 2 )) f or a n q, < Moreover, Theorem 8.1 in Section 8 implies 
that for = o{l){\oge)~ 2 e' l ^ a+1 l 2 \ 

limsup£- 2Q /( a+1 / 2 ) sup{R^0^,P(f)) :/ G ^ m(£) (e~ M )} = VM< oo, 
with the radii C = in (8.4). 

3.5. Superefficiency. An interesting phenomenon with our GEB estima- 
tors is their universal superefficiency in convergence rates in compact sets 
in Besov spaces with q < oo. 

Theorem 3.5. Let [3^ = 0$} be as in (2.10) and (2.11) with positive 
constants (j*,po,bo). Let < a < oo, l/(a + 1/2) < p < oo and < q < oo. 
Then lhn e ^o+e~ 2a/{a+1/2) R {e) (P i£) ,(3) = for \\f3\\« q < oo, and for \\ ■ In- 
compact sets B 

(3.12) lim e^ 2a ^ a+1 ^sup{R^0 ( - £ \f3):peB} = O. 

Theorem 3.5 is proved at the end of Section 7. It indicates that the mini- 
max risks TZ^ e \Bp q ) ~ £ 2 ay(a+l/2) are quite conservative as measurements of 

the risk of our GEB estimators. As a function of (3, the ideal risk R( £ '*\{3) 
provides more accurate information about the actual risk; see Theorems 
3.2 and 7.2. Brown, Low and Zhao (1997) constructed universal pointwise 
superefhcient estimators for Sobolev spaces (i.e., p = 2). Their method also 
provides the superefficiency of the estimators of Efromovich and Pinsker 
(1984, 1986). The classical kernel and many other smoothing methods do 
not possess the superefficiency property. In parametric models supereffi- 
ciency could possibly happen only in a very small part of the parameter 
space, while the superefficiency of the GEB estimators is universal in all 
Besov balls. 

3.6. Dominance of GEB methods. Consider classes B of the unknown (3 
satisfying 

(3.13) BCB; i? (C), liminfe- 2a /( Q+1 / 2 )^ (£) (B) > 0, 

for certain (a,p, q,C) in (3.8), where TZ^ e \B) is the minimax risk (1.4). It 
follows from Theorem 3.4 that our GEB estimators achieve the minimax rate 
of convergence in B, but they may not achieve the minimax constant for B 
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in the limit. We show here that the GEB estimators dominate restricted EB 
estimators within o(l)e 2a// ( a+1 / 2 ) in risk in all classes B satisfying (3.13). 

Let JR( £ >*\(3) be certain "ideal risk" with R^ £ '*\f3) > R^ £ '*\f3) and con- 
sider satisfying 

(3.14) sup (/3) - 0® ,/?)}< o(l)e 2a ^ a+1 ^ . 

Theorem 3.6. Let = 0$} be as in (2.10) and (2.11) with positive 

constants (j*, po,bo). Let 1Z^ £ \B) be the minimax risk in (1.4). Suppose 
(3.13) and (3.14) hold. Then 

sup{i?( £ ) (/3( £ ) , 0) - R( £ ) 0& ,(3):(3€B} < 
1 ' ' £™+ TZ^(B) 

Consequently, lim^ 0+ {su P/3eB R&0&, /3)}/{su P/3eB R&0&, (3)} < 1. 

Theorem 3.6 is an immediate consequence of Theorem 3.3. Condition 
(3.13) holds if B = {/?: ||/3|| < C} are balls for a certain norm || • || nested 
between two Besov norms with M _1 ||/3||p, , < ||/3|| < M||/9||p jg for a certain 

< M < oo, for example, Lipschitz and Sobolev classes. Examples of (3^ 
satisfying (3.14) include the Johnstone and Silverman (1998, 2005) para- 
metric EB, block threshold (e.g., VISUAL- and SURESHRINK) and linear 
(e.g., James-Stein) estimators with 

(3.16) = £ inf Y,Ef(t{y jk )-(3 3k f 

] k 

for restricted classes T>o (e.g., threshold, linear) of functions £(•). 

4. Nonparametric regression. In this section we describe implementa- 
tion of our GEB estimators in the nonparametric regression model 

(4.1) Yi = f(U) + ei, ei~N(0,a 2 ), i<N. 

We report some simulation results for ti = i/N and unknown variance a 2 , 
and present the exact adaptive minimaxity and super efficiency of GEB es- 
timators for i.i.d. uniform ti and known a 2 . 

4.1. Deterministic design and simulation results. The white noise model 
(1.1) is directly connected to the nonparametric regression model via discrete 
wavelet reconstruction. Suppose ti = i/N and N = 2 J+1 in (4.1). A discrete 
wavelet reconstruction can be expressed by invertible linear mappings 

(Vjk, k < 2 jV0 , j <J) = N- l / 2 W NxN {Y % ,i< AO, 

(4-2) 

J 2-7 VI 
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Fig. 1. Signals; clockwise from top left. Blocks, Bumps, HeaviSine, Doppler. 



where WnxN, called the finite wavelet transformation matrix, is a real or- 
tho-normal matrix, Wjk(i) specify the inverse of YVnxN, and y/NWjk(i) ~ 
<f>jk(ti) with wavelets It follows that y^ are independent normal vari- 
ables with Eyjk m J fcpjk and Var (yjfc) = e 2 = a 2 /N. See Donoho and Johnstone 
(1994a, 1995) for details. 

Although the variance e 2 can be fully identified, that is, estimated with- 
out error, based on data in (1.1) or (1.6) for square summable /?, that is, 
/ f 2 < oo, implementation of GEB estimators in the nonparametric regres- 
sion model (4.1) requires an estimation of the variance a 2 . Among other 
methods, estimates of a 2 can be constructed from observations at the high- 
est resolution level, for example, 



--matv/XF ,_ median(y / iV|^ fc |:l<fc<2 J ) 
(4.3) = MAB(VNy {J] ) = mfidian( | JV(0j > 

which converges to a at the rate jV"-<*p/(p+i) + AT -1 / 2 in Besov balls. The 
regression function / is then estimated by 

(4-4) f(i/N) E^ifcCO^U/^ 

j=-l k=l 

via (4.2), where 0f^ are as in (2.10) and (2.11). 

Now we report some simulation results to illustrate the performance of 
our GEB estimators. Figure 1 plots four examples of regression functions in 
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Donoho and Johnstone (1994a). Normal errors are added to these functions, 
with signal-to-noise ratio 7, and the resulting response variables Yi, as in 
(4.1), are plotted against ij = i/N in Figure 2, with sample size N = 2048. 
Figure 3 reports the GEB estimates (4.4) based on the data in Figure 2, with 
j* = 6, po = 0.4, r] n = and bo = 2 in (2.10) and (2.11). Figure 4 reports the 
reconstructions of these regression functions using SURESHRINK in S-plus 
[Donoho and Johnstone (1995)], also based on the data in Figure 2. The 
GEB and SURESHRINK estimates look similar in these examples. 

4.2. Random design. Now consider (4.1) with i.i.d. uniform ij in [0,1]. 
We implement GEB methods with Haar basis and provide their optimality 
properties. 

Let lj t k(x) = I{(k — 1)/2 J < x < k/2 3 }. The Haar wavelets are (f)j >k = 



v2^(%+i,2fc-i — %+i,2fc)> 3 > 0, and = 1, and the corresponding wavelet 

coefficients are 



where f j)k = 2 J J l fl jik . Let N j:k = J2i and Y j,k = E, ^%,fe(*i)/^,fe- 




(4.5) P ith = PM) 




i>o, 

3 = -1, 



Define 



(4-6) y hk 



fij,k(Yj+l t 2k-l — ^j+l,2fc) 



3 > 0, y-i,i = Y 0A 



VN(l/N j+lj2k -i + l/N j+lj2k ) 1 / 2 





Fig. 2. Signals + noise with N = 2048; signal-to-noise ratio is 7. 



14 



C.-H. ZHANG 



(I A 

1 1 y 


- i | I— 




J 






1 





Fig. 3. GEB estimate of signals using S8 wavelets; j* —6, po = 0.4, r)u] — 0, &o = 2. 

where <5 Jjfe = /{A^+i^/t-iiVj+i^fc > or j = -1}. Conditionally on y^ k 
are naive estimates of f3j^ for 5^ = 1, standardized to have variance e 2 = 
a 2 /N, and yj^ = for <L ^ = 0. In fact, conditionally on {U}, yj k are inde- 



M 













Fig. 4. SURESHRINK reconstruction using S8 wavelets. 



pendent N((5j t k,8j,k£ ) variables with 
(4.7) 

_ 5j,fe(/j+l,2fe-l — fj+l,2k) 
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2> 



J > 0, = /<,,: 



where fj )k = J2i f(U)^-j,k(U)/Nj ! f c - By the strong law of large numbers, (3j >k — > 
as TV — ► oo. 

The statistics <5j,jfe} i n (4.6) are sufficient. Since the data contains 

no information about /3j t k for 8j tk = 0, we estimate j3y^ = {(3j,k '■ $j,k = 1} by 
GEB based on yy] = {y j>k : 5 jjk = 1}, 

(4.8) $ j>k = y jtk I{j <j*} + 5 j)k £iffl(y jjk /e)I{j* <j<J}, 

where iy^ is as in (2.8) with rij = J2k^j-k^ p( n ) an d b(n) in (2.11) and 
A v / 21ogn~ ^ ( x-y jtk /e 



J k=i 
* -1 * " 

K U] = 1 - — <^ e 
n J fe=l 



(21ogn i )-V2 

(%,fc/e) 2 /2 



We estimate / by (1.7) via the Parseval identity (1.8). 

The following theorem asserts the exactly adaptive minimaxity and super- 
efficiency of GEB estimators in Besov balls. Let fj(x) = J2eZ-i J2k Pt,k^-£,k( x ) 
be the piecewise average of / at resolution level j. For Haar coefficients (4.5), 
the Besov norm in (3.9) can be written as 

g/p) X ll 



p, q ={\foi\ 9 +f:y aq ( fifi-fj 

{ ? '=0 VJ0 



Theorem 4.1. Let ||/|| = (Jq f 2 ) 1/2 and (a,p) satisfy a 2 /(a + 1/2) > 
l/p — 1/2. LetEf be the expectation in (4.1) under which ti are i.i.d. uniform 
variables in (0, 1). Let f = /jy be as in (1.7) based on = in (4.8), with 

the cut-off resolution levels J = Jjy satisfying 1/logiV < t]n = 2 J+1 /N = 
0(1). Then, for all function classes T = {/ : g < C} 

(4.9) sap E f \\f N - /|| 2 = (1 + Cn) MsupEfWf- f\\ 2 ~ AT^C 2 ^ 1 ) 

with Cn = o(l), provided that a 2 /{a + 1/2) > 1/jp — 1/2 and r/jy = o(l). 
Moreover, i/ ce 2 /(a + 1/2) > 1/p - 1/2 orrQ 1 = 0(l), then (4.9) holds with 
Cn = 0(1) anc ^ for all \\ (3 {f)\\p p - compact classes J 7 

(4.10) su P E f \\f N - /|| 2 = o(l)N~ 2a ^ 2a+1 l 
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For 5j t k = 0, the N observations in (4.1) contain no information about 
(3j^k in (4.5). For 2 J > N, this happens for at least half of fy^. Thus, the 
minimax MISE is at least of the order maxf e jrJ2j,k^jk(f)^{^ — ^} ~ 
jy-2(a+i/2-i/p) - n B esov c l asses i n (4.9). It follows that the condition 
a 2 /(a + 1/2) > l/p - 1/2, that is, a + 1/2 - l/p > a /(2a + 1), is neces- 
sary for (4.9). Theorems 4.1 and 9.1 are proved together at the end of the 
Appendix. 

5. Related problems. Although the focus of this paper is on the white 
noise model, our methods have much broader consequences in nonpara- 
metric problems and their applications. In addition to the direct imple- 
mentations in nonparametric regression models in Section 4, the connec- 
tions between the white noise model and a number of experiments have 
been recently established in the form of global asymptotic equivalence. This 
was done by Brown and Low (1996), Donoho and Johnstone (1998) and 
Brown, Cai, Low and Zhang (2002) for nonparametric regression, by Nussbaum 
(1996) for the nonparametric density problem and by Grama and Nussbaum 

(1998) for nonparametric generalized linear models. The impact of such 
equivalence results is that statistical procedures derived in the white noise 
model, including those in this paper, can be translated into asymptotically 
analogous procedures in all other asymptotically equivalent problems. Adap- 
tive estimation in the white noise model (1.1) is also closely related to statis- 
tical model selection [cf. Foster and George (1994) and Barron, Birge and Massart 

(1999) ] and to information theory [cf. Foster, Stine and Wyner (2002)]. 
There has recently been a spate of papers on adaptive wavelet-based non- 
parametric methods; see Donoho and Johnstone (1994a, 1995), Donoho, Johnstone, Kerkyacharian and 
(1995) and Juditsky (1997) on wavelet thresholding in the white noise and 

nonparametric regression models, Johnstone, Kerkyacharian and Picard (1992) 
and Donoho, Johnstone, Kerkyacharian and Picard (1996) on related meth- 
ods in density estimation, Hall, Kerkyacharian and Picard (1998, 1999) and 
Cai (1999) on block threshold estimators, Abramovich, Benjamini, Donoho and Johnstone 

(2000) on thresholding based on the false discovery rate, and the recent 
book of Hardle, Kerkyacharian, Picard and Tsybakov (1998). Adaptive ker- 
nel methods were considered by Lepski, Mammen and Spokoiny (1997). These 
estimators are either nearly adaptive minimax with an extra logarithmic fac- 
tor in maximum risk in Besov balls (3.9) or rate adaptive for restricted values 
of a and p, for example, a + 1/2 — l/p > {(l/p — l/2) + + 7 — l/2} + in the 
white noise model, < 7 < 1/2, and a>l/p and p > 1 in nonparametric 
regression and density problems. This naturally raised the question of the 
existence of fully rate adaptive estimators for all Besov balls in (3.8), to 
which Theorem 3.4 provides a positive sharper answer: adaptation to the 
minimax constants. Cai (2000) pointed out that such sharp adaptation can- 
not be achieved by separable estimators. The practical value of adaptation 
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for a < 1/p and p < 1 is clearly seen from Example 3.1 and Theorem 4.1 and 
will be further discussed in Section 8. Spatially adaptive methods were also 
considered by Breiman, Friedman, Olshen and Stone (1984) and Friedman 
(1991). Johnstone and Silverman (1998, 2004, 2005) proposed a parametric 
EB approach based on the posterior median for Gaussian errors with respect 
to a prior as the mixture of the point mass at zero and a given symmetric 
distribution (e.g., double exponential), with a modified MLE for the mixing 
probability. Their methods are rate adaptive minimax in all Besov balls and 
provide stable threshold levels for sparse and dense signals. 

Our strategy is to translate high- and infinite-dimensional estimation 
problems into estimating a sequence of normal means and use block EB 
methods to derive adaptive estimators. Within each block, one may use 
general [Robbins (1951, 1956)], linear [Stein (1956), James and Stein (1961) 
and Efron and Morris (1973)] or other restricted EB methods. From this 
point of view, the estimator of Efromovich and Pinsker (1984) is block lin- 
ear EB, while those of Donoho and Johnstone (1995) are block threshold 
EB. In the wavelet setting, restricted EB could yield exactly adaptive min- 
imax estimators in Besov balls with a fixed primary shape parameter p, if 
5 {tp tC '■ c > 0}; i n view of the difference between (3.1) and (3.16), where 
t* c is the minimax Bayes rule for the class of priors {G: J \9\ p dG(9) < c p }. 
But this is not practical, since the explicit form of t* c is intractable for p < 2. 
In particular, for p < 2 the Bayes rules t* c are nonlinear analytic functions, 
so that linear and threshold estimators do not achieve exact asymptotic min- 
imaxity; see Donoho and Johnstone (1994a, 1998), (3.15) and (7.1) at the 
resolution level T ~ e -V(a+i/2)_ We fa^hst refer to Morris (1983), Robbins 
(1983) and Berger (1985) for general discussion about EB and Bayes meth- 
ods. 

Adaptive minimax estimation has a number of interpretations. Define 

t(£:B,B) = — r — — , 

where TZ^ £ \B) is the minimax risk in (1.4). Given estimators (3^ and a 
collection B of sets B in the parameter space, exactly adaptive minimaxity 
means r(e; (3^ e \ B) — > 1 as e — > 0+ for all B G B, rate adaptive minimax- 
ity means r(e; /3^ e \ B) = 0(1), and nearly adaptive minimaxity means that 
r(e;^ £ \B) is slowly varying in e, and with obvious change of notation e <-> 
a l ' \fn and for nonparametric regression and density estimation prob- 

lems. In the wavelet setting, rate and nearly adaptive minimax estimators 
were derived in Hall and Patil (1995, 1996) and Barron, Birge and Massart 
(1999), and block James-Stein estimators were recently investigated by 
Cavalier and Tsybakov (2001, 2002), in addition to papers cited above. There 
is a vast literature in nonparametric estimation methods, and asymptotic 
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minimaxity and adaptivity have been commonly used to judge the over- 
all performance of estimators; see comprehensive reviews in Stone (1994), 
Donoho, Johnstone, Kerkyacharian and Picard (1995) and Barron, Birge and Massart 
(1999), and recent books by Efromovich (1999) and Hastie, Tibshirani and Friedman 
(2001). 

6. Compound estimation of normal means. Let (Xk,6k), 1 < k < n, be 
random vectors and let Pe^ be the conditional probability given #(„) = 
(#1, . . . , 9 n ) under P. Write P = P#, , when 9^ is deterministic. Suppose X k 
are independent N(9k,l) variables under the conditional probability Pe^ n y 
In this section we consider the estimation of 9 k under the compound squared 
error loss Y?k=i(@k ~ ®k) 2 ■, that is, the estimation of normal means within 
a single block or resolution level based on (2.2) or (1.1), scaled to the unit 
variance. 

Let X ~ N(9, 1) under Pg. Define the Bayes risks for Borel t(-) and their 
minimum by 

(6.1) R(t,G) = J E e (t(X)-9) 2 dG(9), R*(G) = MR(t,G). 

As pointed out by Robbins (1951), the compound mean squared error for 
the use of 9k = t(Xk) is R(t,G n ), where G n is the mixture of the marginal 
distributions of #( n ), 

1 n 

(6.2) G n (x) = -T i P{e k <x}. 

We drive GEB estimators whose compound risk approximates the ideal 
Bayes risk R*(G n ). We measure the performance of this ideal approximation 
via oracle inequalities of the form 

1 - 

(6.3) - £ E(9 k - 9 k f - R*(G n ) < r(n, G n ), 

where r(n,G) are functionals of n and univariate distributions G only. The 
definition of r(n,G) may vary in different statements in the sequel, as long 
as (6.3) holds under specified conditions. 

The components of the vector Qr n \ are assumed to be independent in The- 
orem 6.1 below. In all other theorems, conditions on 9/ n ) are imposed only 
through the mixture G n in (6.2), so that 9 k are allowed to be stochastically 
dependent. The independence assumption on 0i n \ in Theorem 6.1 accom- 
modates the two important special cases of deterministic and i.i.d. 
This allows us to apply Theorem 6.1 conditionally on 9ua whenever r(n,G) 
in (6.3) is concave in G. Note that if r(n, G) is concave in G, (6.3) follows 
from its conditional version given 9( n \, since R*(G) is always concave in G 
due to the linearity of R(t,G) in G in (6.1). 



GENERAL EB METHODS 19 

6.1. GEB estimators. Zhang (1997) proposed the following GEB esti- 
mators: 

(6.4) k<u, inAx) ^ x + -M- y 
where p = p n — > 0+, 1/n < p < l/^/27t, and (p n is the kernel estimator 

I n ra n e -ixu n e iuX k 

(6.5) cp n (x) = -^2a n K(a n (x-X k ))= — — ^ du 

n k=i J - a " l7T k=i n 

with the kernel K{x) = sin(x) / (irx) . We use the special a n = v / 2Togn through- 
out the sequel, which provides the best bounds in this paper. We first de- 
scribe an improved version of the oracle inequality of Zhang (1997) and its 
immediate consequences. 

Theorem 6.1. Suppose the components of 0( n \ = (Oi, . . . ,9 n ) are inde- 
pendent variables. Let 6k = t riiP {Xk) be the GEB estimator in (6.4) with 
p _1 (logn) 1 / 4 /y / n = o(l). Then (6.3) holds with 

(6.6) r(n, G) = A(p, G) + {1 + 77(71, p)}A* (n, p), 
where r](n,p) =o(l) depending on (n,p) only, 

/oo 
Wg/VG?{1 ~ <p G /(<PG V p)}\ G dx 
-00 

with (pc = <p(x]G) in (2.5), G n in (6.3) is as in (6.2), and 
(6.8) A*(n,p) = {V(2/3)logn+ V- log(p 2 )} 2 V 



irpn 



Remark, (i) The oracle inequality (6.6) was proved in Zhang [(1997), 
Theorem 1] under the stronger condition p^ 1 \J '(logn) / 'n = o(l). The weaker 
condition is needed since (2.11) is used in this paper, (ii) By (6.7), R*(G) + 
A(p, G) < 1, since 

(6.9) R*(G) = l-J {^j\ G dx. 

The main consequences of Theorem 1 of Zhang (1997) and Theorem 6.1 
above under weaker conditions on p n are asymptotic minimaxity and asymp- 
totic optimality. It is well known that the minimax mean squared error for 
compound estimation of normal means is the common variance. 

Theorem 6.2. Let 6k = t n , jP (Xk) be as in (6.4) with p = p n — > and 
(logn) 1 /4/( /9 ^)^ . 
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(i) Asymptotic minimaxity: For the A*(n,p) in (6.8) 



sup - J2 E e (n) 0k ~ k ) 2 < 1 + (1 + o(l)) A* (n, p n ) I- 

6 (n) U k = l 



(ii) Asymptotic optimality: If G n converges in distribution, then 



1 



n 



(6.10) 



71 



J2E(6 k -e k ) 2 -R*(G n )^0, 



k=i 



where R*(G) and G n are as in (6.1) and (6.2). Moreover, for m n = o(l/p n ) 
and any stochastically bounded family Q of distributions, (6.10) holds if for 
certain < w n> o — ► and distributions H n o, G n (x) = Ylj^o w n,jH n ,j(z — 
c n,j) with H n j £ Q for j > 1 and reals w n j > and c n j, that is, G n are 
within o(l) mass from mixtures of at most o(l/p n ) arbitrary translations of 
distributions in Q . In particular, (6.10) holds if J| x _ Cn | >mn dG n {x) — ► for 
m n = o(l/p n ) and certain constants c n . 

Remark, (i) Zhang [(1997), Proposition 2 and Corollary 3] pointed out 
that (6.10) holds when G n converges in distribution or when G n are arbitrary 
discrete distributions with no more than o(l/p n ) components. The weaker 
condition in Theorem 6.2(h) is equivalent to G n (A n ) — > 1 for certain unions 
A n of at most m n = o(l/p n ) intervals of unit length. This demonstrates the 
extent of adaptivity of GEB estimators when {Ok} has many clusters. 

(ii) The proof of Theorem 6.2(h) utilizes the following inequality: for all 
distributions Hj and weights Wj > with X^jLo^j = 1> 



(hi) The locally uniform asymptotic optimality criterion in (6.10) is slightly 
stronger than the usual one for fixed G = G n in the EB setting. 

6.2. Oracle inequalities based on tail-probabilities and moments. We shall 
derive more explicit oracle inequalities in terms of the tail and moments 
of G n . Define 



rn 



in 



(6.11) 





3=0 



j=0 




(6.12) 




< p < co. 
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Lemma 6.1. Let x>0, < p < l/\/2n and ipc be as in (2.5). Then 



A(P,G)<[ (&) 

J<PG<P \<PgJ 

(6.13) 



< G(x) + 2x / 9inax{L 2 ( / o), 2} + 2p\Jh 2 (p) + 2, 



where A(p,G) is as in (6.7) and L(p) = v — log(27rp 2 ). Furthermore, for 
x = L(p)/2, 

(6.14) A(p, (?) < G(x) + G 2 (x)(l - G(x)) + 2pv / Z 2 (p) + 2. 

Lemma 6.1 is used in combination with Theorem 6.1 to produce more 
explicit oracle inequalities in Theorems 6.3 and 6.4 below, with (6.13) for 
stochastically large G n and (6.14) for stochastically small G n . For stochas- 
tically very small G n and — log p 2 n < (1 + o(l)) log n, the leading term in the 
combination of (6.6) and (6.14) is 

(6.15) A*(n, p n ) + 2p n VL( Pn )+2< (1.724 + o(l))^ ( & + ^) 

with equality for — log/9 2 = (1 + o(l)) logra, where p* n = 0.6094 v /2(log n)jn. 
The choice of p ~ p* and the oracle inequalities below are not necessarily 
optimal, since crude bounds are used at several places in the proofs. In 
principle, we may use data-driven p via any methods of choosing tuning 
parameters, but this is beyond the scope of this paper. In what follows, we 
denote by rj n constants depending on n only and satisfying r\ n — > 0. 

Theorem 6.3. Let § k be as in (6.4) with p = p(n) in (2.11). Then (6.3) 
holds with 



r(n, G) = inf \g(x) + (1 + Vn )xV8p \ + (i + r?n ) Co ( Po )^ 



(<U6> < (1+ , k( ^^' G '^"» 3/ 7 /,p+ ' , + c„(p„)^ 

L V V n / 8 / 
w/iere G(x) and n P {G) are as in (6.12) ; C p spVGH-i) + p -p/(p+ 1 ) < 2 and 

C (a:) = 1.724(0.6094/x + x/0.6094). Moreover, for x n = V-log(27r/3 2 )/2 ; 
inequality (6.3) ZioZds u>i£/i 

(6.17) r(n,G) = (5/4)G(x n ) + (1 + r ?n )C ( / 9 )(logn)/v / ^. 

Theorem 6.3 provides the asymptotic optimality of GEB estimators with 
convergence rates {(logra) 3//2 /y / n}~ p /( p+1 ) in (6.10) for dependent {Ok} with 
bounded p p (G n ). 
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6.3. Stochastically very small distributions and threshold estimators. The 
risk bounds in Theorems 6.1 and 6.3 are not very useful if an overwhelming 
majority of 9 k are essentially zero, for example, p.2(G n ) < 1/y/ri. For these 
stochastically very small empirical distributions G n , threshold estimators 
may outperform the GEB estimators (6.4). 

Soft threshold estimators are defined by 

(6.18) § k = 8 X (X k ), s A (x) = sgn(x)(|x| -A)+, 

where A > is a threshold level. Hard threshold estimators are defined by 
functions h\(x) = > A}. Hard and soft threshold estimators have sim- 

ilar properties. We consider soft threshold estimators so that sharp oracle 
inequalities in Lemma 6.2 below can be utilized. 

The performance of (6.18) is commonly compared with n{G n ) = EJ2 k =i(9 k ^ 
l)/n given in (2.7). For A C {1, . . . ,n}, let tA be the estimator defined by 
9 k = X k I{k G A}. Since the MSE of X k is smaller than the MSE of 9 k = 
iff \9 k \ > 1, K>(G n ) = inf^ R(tA, G n ) when 9i n ) is deterministic. Thus, n(G n ) 
is the ideal risk for a different oracle expert, someone with the knowledge of 
the best choice of A, who always uses the best tA- 

Lemma 6.2. Let s\ = s\(x) be as in (6.18) and let the risk R(t,G) be 
as in (6.1). Then Eg(s\(X) — 6) 2 is increasing in \9\, and 

(6.19) R(s x , G)< J min ju 2 + ^<f(X), A 2 + 1 J dG(u). 

Consequently, for A = y / 21ogn and with p, p {G), G(x) and k(G) as in (6.12) 
and (2.7), p < 2, 

^ (SA ' G) ~ ^{ (2tog»flWi ' ^ 21 ° gn ^ 1 ) + K ^} 
(6 - 20) + V2 



(logn) 3 / 2 



n 



The inequalities in Lemma 6.2 are essentially the oracle inequality of 
Donoho and Johnstone (1994a). The improvement with the extra factor 
(logn)" 3 / 2 in the second term on the right-hand side of (6.20) is needed when 
we apply it to all high-resolution levels j near the infinity in the sequence 
model (1.1). Lemma 6.2 implies R(s x ,G) < (A 2 + l)rc(G) + 4X~ 3 ip(X), which 
is an oracle inequality since it compares R(s\,G) with the ideal risk k(G). 
For A = ^2 logn, Foster and George (1994) showed that A 2 + 1 is the optimal 
risk inflation factor from a model selection point of view. 

Since G(x) < k(G) < /45(G) for p < 2 and x > 1, the GEB oracle inequal- 
ity (6.17) (with x n — > oo) can be directly compared with (6.20). The risk 
bound for the threshold estimator is of larger order than the regret of the 
GEB estimator if K(G n ) v / n/ logn — > oo. 
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6.4. Hybrid GEB methods. In the white noise model (1.1), J2j,kf^jk < ~- 
oo, so that the ideal risk n(G n ) converges to zero as n = 2 J — > oo. Thus, 
the performance of the GEB estimator (6.4) could be enhanced if hybrid 
estimators are used, that is, switching to the threshold estimator (6.18) for 
small n{G n ). By Zhang (1990), G n {x) and thus K,{G n ) = Jq G n (u) du 2 can 
be estimated only at logarithmic rates. Our strategy is to construct hybrid 
estimators based on accurate estimates of the order of n{G n ). 

The order of magnitude of k(G) in (2.7) is the same as that of 

(6.21) k(G) = 1 - J V2e- x2/2 ^(x; G)dx = l — J exp(-n 2 /4) dG(u). 

In fact, since (1 — l/e)x < 1 — e~ x < x for <x < 1, 
e — 1 , , , f / u 2 



4e 

(6.22) 



k{G) < (1 - 1/e) / (x A X ) dG ^ 

< k(G) <J(^^lj dG{u) < k(G). 



Thus, the order of k(G u ) can be estimated by 

1 n 

(6.23) k n = l--Y,V2eM-X 2 k /2). 

n k=i 

This suggests the following hybrid estimators: 

(6.24) k = t n (X k ), t n (x) = i n , p , X fi(x) = { tn '1 W> [ l kn J ?' 

where i n ,p( - )> s x(') an( ^ are as in (6.4), (6.18) and (6.23), respectively. 
For definiteness, we choose in (6.24) p = p(n) and b = b(n) in (2.11) and 
A = log n, unless otherwise stated. This choice of p n optimizes the order 
of risk bound (6.17). The choice of A n matches the universal thresholding 
[Donoho and Johnstone (1994a)] and provides the optimal risk inflation fac- 
tor [Foster and George (1994)]. The choice b n here ensures the use of (6.4) 
for large K{G n )y/n/\ogn. 



Lemma 6.3. Suppose that {Ok} are independent variables under the ex- 
pectation E. Let t n = t n n\ t b be the hybrid estimator in (6.24) with A = A n = 
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\J1 logn. Then 

y-, E(t n (X k ) 
k = l 

(6.25) 



n 



]T E(inA x k) - O k f/n + (2 + r ?n )(logn)/ 



n 



k=l 



< ( 



R{s x , G n ) + (1 + r ?n )(logn) 2 /(7r 2 ^n 3 ), 

n 

J2 E(t ntP (X k ) - 6 k f/n + R(s\, G n ), 
k=l 



«(G„) > b+, 
K(G n ) < b~, 
otherwise, 



with rj n — > uniformly for all choices of p = p n and b = b n , where t n „, s\ and 
it(G) are as in (6.4), (6.18) and (6.21), respectively, b n ~ = b n + ^/2(log n)/n, 
and 6~ = b n — ^/3(logn)/n. 

Remark. Let (p,6) be as in (2.11) and A = v / 2Togn. By (6.17) and the 
fact that G(l) < k(G), (6.3) holds with r(n,G) = 0(l)(logn)/y/n for the 
GEB estimator (6.4) when b~ < k(G n ) < b+. 

Theorem 6.4 below provides oracle inequalities for (6.24) in terms of the 
tail of G n in (6.2). 

Theorem 6.4. Let 9 k = i nyPy \^(X k ) be the hybrid GEB estimator in (6.24) 
with (p,b) in (2.11) and A = y/2 logn. Then there exists a constant M < oo 
such that (6.3) holds with 



(6.26) 



r(n,G) = M min {r (n, G),r pA2 (n, p p (G))} 
+ - 1 + 7? "„ /n Vp>0,n, 



n(logn + l) 3 / 2 

where r p (n,C) and p p (G) are as in (3.3) and (6.12), and with G as in (6.12), 



ro(n, G) = min 1, 



(6.27) 



logn 



G(y/u) du, 



(logn) 



n 



+ inf 

X>1 



G(x) + x 



(logn) 3 / 2 



n 



Remark. It follows from our proof (with slight modification) that if 
larger A = \/2(l + Aq) logn is used in (6.24) with Aq > 0, Theorem 6.4 holds 
with (1 + r7 n )/{n 1+j4 °(logn) 3//2 } as the second term on the right-hand side 
of (6.26). See the remark below Theorem 3.2. 

Theorem 6.4 implies that the compound risk is approximately n~ 1 (logn)~ 3 / 2 
when 6 k = for all k. Proposition 6.1 facilitates applications of Theorem 6.4. 
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Proposition 6.1. Letr (n,G) andr p (n,C) be as in (6.27) and (3.3). 
Let w' Aw" > 0. 

(i) ro(n, G) is concave in G and ro(n,G) <<Sr pA 2(n,/J> p (G)) forallp>0. 

(ii) If G < w'G +w"G' for two distributions G' and G" , then ro(n, G) < 
r (n;w'G') + r (n;w"G"). 

6.5. Minimax risks in i v balls. Now we compare the minimax risk 

1 n 

(6.28) TZ n (@) = mi sup - £ E 0(n) (9 k - 9 k ) 2 , QcR n , 
8{n) 9 {n) eO n k=l 

in l p balls with the maximum of the Bayes risk R*(G) (6.1) in L p balls. Here 
9r n \ = (0i,... } 9 n ) are considered as deterministic vectors and the minimiza- 
tion is taken over all estimators 9r n \ based on Xr n \. Our result is based on 
Proposition 6.2 below, which provides the continuity of the Bayes risk R*(G) 
in G. Let ||£( n )||p,n = (J2k=i l^fcl*') 1 ^ as in (3.9). The l p balls are defined as 

@ p>n (C) = {9 {n y.n- 1 /P\\9 {n) \\ Pin <C}, 

while the L p balls are {G:/j> p (G) < C} with the fJ> p (G) in (6.12). 

Proposition 6.2. Let R*(G) be as in (6.1). For all distributions Hi 
and H2 in M 



(6.29) \R*((l - w)H x + wH 2 ) - < w{l + V21og {yfi/w) f ■ 

Furthermore, if there exist random variables 9 k ~ G k with P{\9\ — 9q\ < 
V2} > 1 — Tji > 0, then 



(6.30) \R*(Gx) - R*(G )\ < 2 m {l + v^log (V2/ m ) } 2 + Vsjl + 



Proposition 6.3. Let p' =p A2 and ^(u) = {ulog^l/u)} 1 / 3 . Define 
*(log + (C)/(V)), ifC>l, 



(6.31) r;(n,C) = 



M/(C 2 p'{ log + (l/CP')f~ p /(np 2 p')), ifC<l, 



and r^(n,C) = 0. Then there exists a universal constant M such that for 
all < p < 00 

(6.32) K n (& p>n (C)) < sup R*(G)<n n (e p>n (C)) + Mr*Jn,C). 

ti p (G)<C 
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Remark. Let A = {21og + (l/C p ')} 1 / 2 . By (6.19) of Lemma 6.2, uni- 
formly in p as CP ->0, 

sup iT(G) <G V (A 2 - ?3 ' + l + 4/A 3 ) 
H P {G)<C 

(6.33) , , 

= (l + (l))CP{21og + (l/CP)} 1 -P/ 2 . 

Thus, for small C v , (6.32) is sharp only when Mr*(n, C) is smaller than the 
right-hand side of (6.33), that is, large C p ' (np 2 p') / {log(np 2 p')} 1+p ' I 2 . 



The minimax risk in l v balls and the maximum Bayes risk in L p balls 
have been studied by Donoho and Johnstone (1994b), who proved 

l im ^ piG) <cR*(G) 

<£&f CpA2{_21og(CP A2 )}( 1 -(P A2 )/ 2 ) P ' 



(6.34) ft„(9 p , n (C))< sup R*(G) 

H P (G)<C 

<b 2 sup iT(G) Vp>0, b>l, 
lH,{G)<C/b 

and under the extra condition C p n/ (log nfl 2 — > oo for p < 2, 7£ n (6 Pjn (C)) ~ 
sup^((3-)<c< R*(G) as n — ► oo. Proposition 6.3 is derived from Proposition 6.2, 
(6.34) and Lemma A. 3 in the Appendix. 



6.6. Adaptive minimax estimation in £ p balls. An immediate consequence 
of Theorem 6.4 is the adaptive minimaxity of the GEB estimators in t v 
balls @p tn (C), in view of the result of Donoho and Johnstone (1994b) on 
the equivalence of the minimax risk in £ p balls and the maximum Bayes risk 
in L p balls. 

Theorem 6.5. Let lZ n {®) be the minimax risk in (6.28) and 9/ n -\ = 
(01, ...,§ n )be the hybrid GEB estimator in Theorem 6.4. If 'C p ^h~/ '(log n ) 1+ (P A2 )< 12 
oo, then 

1 n 

(6.35) sup -J2E(6 k -6 k ) 2 = {l + o(l)}K n (Gp, n (C)). 

e in) ee P ,n(o) n k =i 

Moreover, if C v nj (log then (6.35) holds with the o(l) re- 

placed by O(l). 
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7. Oracle and risk inequalities for block GEB estimators. We provide 
here stronger versions of the theorems in Section 3. This is accomplished 
by inserting the inequalities in Section 6 in individual blocks or resolution 

levels. Throughout the section, Ep denotes the expectation of models (2.2) 
or (1.1) and (3 is treated as a deterministic sequence. Performance of GEB 
estimators in more general classes of (3 will be considered in Section 8. 

7.1. Oracle inequalities. Consider the general sequence (2.2). It follows 
from (3.1) that 

(7.1) ifl) = e 2 £ nj min R(t , G$ ) = e 2 ]T nj R* (t,G$), 

3 3 

where R(t,G) and R*(G) are as in (6.1) and are as in (2.4). By (2.3) 
and (2.8) 

3 ke\j] 

( 7 - 2 ) 

3 ke\j] 

where (Xk,6k) = (yk,Pk)/ e - Since (2.8) is the implementation of (6.24) in 
block j, application of Theorem 6.4 in individual blocks in (7.2) and (7.1) 
yields Theorem 3.1 and the following theorem. 

Theorem 7.1. Let and be as in Theorem 3.1, let R^(/3) 
be as in (3.1) and let ro(n,G) be as in (6.27). Then there is a universal 
constant M < oo such that 

(7.3) R^0M,P) - R^)(P) < Me^nMn^) + ^— 1 ^ }. 

7.2. Uniform ideal adaptation in Besov balls. In the wavelet setting (1.1), 



nj 



2\ and < C iff for certain Gj > with (Ej C|) 1/9 = C, 



2 J V1 a „\ !/P 



< 2 -i(tH-l/2) < 2 -i(»+l/2) £ y • 

e e 



(7-4) ^(G^) = (± g 

in view of (6.12), (2.4) and (3.9). Thus, the bound in Theorem 3.1 can be 
explicitly computed to provide uniform convergence rates for the regret of 
the GEB estimator (2.10). 

We first define certain constants and bounded nonincreasing slowly vary- 
ing functions. Set 

(7.5) a± = 2a + 1/2 — 1/p , a% = min(ai, a + 1/2), 
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with p' = min(p, 2), and 

_ 1/2 + 1/p' 
«i + 1/2 

= f l + 3/(2a + 2), ifaj/>l, 

72 ~ \ 3 - (1/2 + 2/p')/(a 1 + 1/2), otherwise. 

Let 7 = p'(a + 1/2) - 1 and p = 1/(1 - p'/q)+ £ [l,oo]. Define 

(7.7) L« (e) = ^ (1 + a)-^ [c^ + ( ^^ f ^ < M 
with c4 |P|g = (1 - 1/2PT)~ 1 /P an d d' a ^ q = (1 - 1/2P7)-Vp-1+p'/2 ) and 

(7.8) Lt 2 )( £ )=Lg)( £ ) = (l + a)-^ 



mm 



(q + l)/log + (l/ £ ) 
x ' i _2-I(°+i)p7(p'+i)-i| 



Theorem 7.2. LetR^ £ '*\fi) be the ideal risk in (1.3) and let R^ 0^, (3) 
be the risk (1.2) of the GEB estimator (2.10). T/ien £/iere exists a universal 
constant M < oo suc/i £/iat 

sup{ J R^(/3( £ ),/3) - R^ £, *\f3) : /3 E B«,(C)} 

(7-9) 

< MC 2 |(e/C7) 2 + ^(e/O^/^'+V 2 ) logX J (G7e)L ( %/C) j, 

/or all < e < C and Besov balls Bp q (C) 6 -Sbcsov zn (3-8), where ctj > a 
and 7j are as m (7.5) and (7.6), and L( J ) are £/ie bounded nonincreasing 
slowly varying functions in (7.7) and (7.8). 

Remark, (i) Since ct\ > «2 > the right-hand side of (7.9) is of smaller 
order than £ 2a/(<*+i/2) _ Thug) ( 3 _ 10 ) an( j (7.9) imp i y Theorem 3.3. 

(ii) The scale equivariance (2.12) and (2.13) of the GEB estimators (2.10) 
is reflected in (7.9). 

7.3. Minimax risks in Besov balls. Let (B) and R^ 6 '*^ (f3) be the min- 
imax and ideal risks in (1.4) and (3.1). It follows from Theorem 7.2 that for 
all Besov balls B £ £>b 0S ov 

(7.10) T&\B) < sup#( £ '*)(/?) + (l) e 2Q /( a + 1 /2) 

as e — > + . 

In this section, we provide an inequality which implies 
(7.11, su Pg B , C) ^ OT = 
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Let || (xi, • • • ,x n )|| P)n = (Ysk=i\ x k\ p ) 1 ^ p for p > and n > 1 with usual 
extensions for p\/n = oo. Let Cj denote nonnegative constants. It follows 
from (1.4) and (3.9) that 

n^{B« q (c)) 

> sup y infsup ( yE&0 k - p jh f : H^llU^i < Cj 1 
so that by (6.28) with the scale change 9jj c = f3jk/£ 

oo 

n®{B% tq (C)) > e 2 sup ^2^ 2J (e pi2J (2-^ Q+1 / 2 )C J / £ )). 
Il{c,ni 9 ,<»<c i=1 

Furthermore, it follows from (7.1) and (7.4) that 

sup R {£ '*\p) 

oo 

= e 2 sup y 2> sup {R*(G$ ) : ) < 2"^ +1 /2) C j /e} . 

The above facts and Proposition 6.3 imply 

sup R^\p)-T&\B^Jfl)) 

( 7 - 12 ) 

<e 2 sup V2- ? Mr*(2J,2-^ Q+1 / 2 )C j /e) 
l|{C J -H| 9 ,oo<C J - =1 

forther;(n,C) in (6.31), since sup MG) < c R*(G) -K n (@ p>n (C)) < Mr* p (n,C) 
for all (n,C). 

Theorem 7.3 below, which implies (7.11), is a consequence of (7.12). Define 

(7.13) a 3 = a + (a + l/2)/2, 73 = 2/3, 

and for 7 = p' (a + 1/2) — 1 define bounded slowly varying functions (e) = 
Lal(e) by 

(pV) 1/3 log+ /3 (l/e) 



(7.14) 



1/3 ( 7 + l)(2-p')/3 

+ (1 - 2 -7)l+(2-p')/3 

+ i0g + ^ P P J (1 _ 2 -7)l+(4-p')/3 
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Theorem 7.3. Let TZ^ £ \B) and R^ £ '*\f5) be the minimax and ideal risks 
in (1.4) and (1.3). Then (3.10) and (7.11) hold, and there exists a universal 
constant M < oo such that 

(7.15) sup R (£ '*\P) <n (£ \B) + MC 2 {e/C) 2as/ ^ +1/2) log^(C/e)L^(e/C) 

for allO<e<C and Besov balls B = B° q (C) G B B esov in (3.8), where 03 > 
a and 73 are the constants in (7.13), and L^ 3 ' is the bounded nonincreasing 
slowly varying function in (7.14). 

Remark. For p < q, Donoho and Johnstone (1998) proved (3.10) and (7.11) 
using the minimax theorem for certain classes of random (5. 

7.4. Exactly adaptive minimaxity and superefficiency. The universal ex- 
actly adaptive minimaxity and related inequalities in Theorem 7.4 below 
follow immediately from Theorems 7.2 and 7.3, since the sum of the right- 
hand sides of (7.9) and (7.15) is of smaller order than the rate £ 2a /( a + 1 / 2 ) 
in (3.10), due to ay > a, j = 1, 2, 3. 

Theorem 7.4. Let (3^ be the GEB estimator in (2.3) or (2.10), and let 
TZ^(B) be the minimax risk in (1.4). Then there exists a universal constant 
M < 00 such that 

n^(B« q (Q) 

<su V {RV0( £ \p):/3eB« q (C)} 
<K^(B« g (C)) 

+ MC 2 j {e/C) 2 + ^( £ /C) W(%+i/2) log 7, ( C/e ) L (i) ( e/C ) j ; 

for all0<e<C and Besov balls B^ q (C) G B B esov in (3.8), where constants 
otj > a and jj and bounded functions iffl are as in Theorems 7.1 and 7.2. 

Remark. Since otj > a, Theorem 7.4 and (3.10) imply Theorem 3.4. 

Now we consider the superefficiency of the GEB estimators. Let B be 
a compact set under the Besov norm || • ||" (3.9) with q < 00. Let Hj he 
the projections up to resolution levels J, {U.j(5)jk = f3jkl{j<j}- Since \\/3 — 
— > for every (3 G B and B is compact, 

(7.16) c*j(B) = sup{\\(3-ILjf3\\« q :(3eB}^0 asJ^oo. 
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The super efficiency follows, since the risk for the estimation of by the 
GEB estimator is at most 0(l)2 J e 2 and {(3 - Ujf3:(3e B} C B™ q (c}(B)). 
Formally, by (7.1) with rij = 2? , 

oo V 

R^*\{3) < e 2 2 J+1 + e 2 V V 2-?iT(G[ £ 1 ) ) < e 2 2 J+1 + sup R^*\f3) 

i=j+ifc=i \\P\\?, q <c%B) 

for all (3 G B. Since cj(B) — >• 0, the right-hand side above is o(e 2Q//(a+1 / 2 )) 
as e — > 0+ and then J— > oo, by (7.11) and (3.10) in Theorem 7.3. This and 
(7.9) imply (3.12) and complete the proof of Theorem 3.5. 

8. Bayes and more general classes. The results in Sections 3 and 7 can 
be extended in several directions, for example, Bayes models, more general 
deterministic and stochastic [3 and blocks with sizes rij ^ 2 J . The extension 
to stochastic (3 is relatively straightforward, since the key oracle inequalities 
in Theorems 3.1 and 7.1 are valid under integration over (3, for example, 

ER( £ \ft £ \(3)-ER( £ '*\(3) 

{8A) f „ 1 

^1 b] (logn, + l) 3 / 2 

due to the concavity of ro(n,G) in G. We consider here certain general 
classes of Bayes models including wavelet coefficients in Besov balls and of 
functions with a large number of discontinuities. 

Let (3 be a random sequence and let E be the certain expectation under 

which E^ of model (1.1) is the conditional expectation given (3. Let fl p {(3) 

be the sequence {{E\Pjf t \ p ) 1 ^ p } of the marginal L p norms of 0. Let /3ui = 

{Pjk, k < 1 V 2^'} and n(X u . . . , X n ) = n" 1 J2k=i E ( X k A !)• Consider 



.2) B& = h = p> + p» : f2 p (p>) G B « q (C), K((3ye) 



< 



m 



(e) 



2i 



1 A 



where vnS e ^ and are constants. Let J-d,m{C) be the class of piecewise 

polynomials / of degree d with no more than m pieces and ||/||oo < C. A 
deterministic (3 = (3' + f3" belongs to B^ if £ B^ q (C) and pj k = J <f> jk f are 

the wavelet coefficients of / E •?rf cm (e)(c^ £ ') as in Example 3.1, for certain 
fixed small c > 0. 

Theorem 8.1. Suppose (logefm^ = o(l)e~ 1/{ - a+1/2) and log + (M( e )) = 
0(|loge|). Then (2.10) is uniformly adaptive to the ideal risk R^ £, *\f3) 
in (1.3) over classes (8.2) of random [3, 

(8.3) sup{ER( £ \p {£ \/3) - ER( e, *\f3) : (3 G 5^} = o{e 2a/{a+1/2) ). 
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Moreover, the GEB estimators are exactly adaptive minimax, 

(8.4) sup ER^ 0& ,P) = (1 + o(l))K^ (B& ) = (1 + o(l))K^ (B« q (C)) , 

/3es( £ ) 

where TZ^(B) is the minimax £2 risk for the estimation of a random (3 in 
B. 

Remark, (i) Although B^ in (8.2) is much larger than the Besov class 
BpJC), the minimax risks for the two classes are within an infinitesimal 
fraction of each other. 

(ii) The condition on in Theorem 8.1 is the weakest possible up to a 
factor of (loge) 2 , since mS £ ' = o(l)e~ 1 ^ a+1 ^ 2 ^ is a necessary condition. 

(iii) Deterministic versions of the classes (8.2) were considered in Hall, Kerkyacharian and Picard 
(1998) in the context of density estimation. 

9. Equivalence between the white noise model and nonparametric regres- 
sion. In this section we establish the asymptotic equivalence between the 
problems of estimating / in the nonparametric regression model (4.1) and (5 
in the white noise model (1.1) in Besov classes when f5 = (3(f) are the Haar 
coefficients of /. The asymptotic equivalence is used to prove the adaptive 
minimaxity of the GEB estimators in Theorem 4.1. We assume throughout 
the section that the design variables ti in (4.1) are independent uniformly 
distributed in (0,1). 

Theorem 9.1. Let Ef be as in Theorem 4.1 with i.i.d. uniform ti. Let 
(3(f) and (3(f) be as in (4.5) and (4.7), respectively, and let lij-.fij^^ 
Pj,kl{j < J} be the projections as in (7.16). 

(i) There exist finite constants M QjP such that 

(9.1) ^{nn^c/) -n^/5(/)||-,^F' < A^(2^/Ar)^/2 { || / g (/) ||^ F ' ? 

where p' = p A 2, and 

E f \\UjP-P\\ 2 2 
(9-2) j 

< M a ,A\\P\\ a P , g ) 2 {^n a p' = 1} + ± + 2 J (tt (v 2 -w) f 

(ii) Let e = a/VN and N^oo. For T = {/ : 0(f) £ B£ q (C)} and esti- 
mates /iv based on (4.1), 

(9.3) inf supE f \\f N - /|| 2 = (1 + o(l))K^(B« q (C)) 

In fe? 

for a 2 1 (a + 1/2) > l/p> - 1/2, where \\f\\ = f 2 ) 1 / 2 . 
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Theorem 9. 1 (i) provides upper bounds for the difference between the 
wavelet coefficients (3j^ = J f<t>jk and the corresponding coefficients j3j^ for 
the random discrete Haar system in (4.7). Deterministic discrete wavelet 
systems were considered in Donoho and Johnstone (1998) based on Dubuc 
(1986). For a > 1/p and pV q> 1 or a = p = q = 1, Donoho and Johnstone 
(1998) established (9.3) for deterministic discrete wavelet systems. 

APPENDIX 

We shall denote by M generic finite universal constants which may take 
different values from one appearance to the next, that is, M = 0(1) uni- 
formly. 

Proof of Theorem 3.2. Consider small e > 0. Let rj > and 

f p {n, C) = min [1, C p , {C / ^i) p/{p+1) ) 

= min [1, C p , max {rT 1 / 2 , (C/^f 

It follows from Theorem 3.1 and part three of (3.4) that the regret (3.6) is 
bounded by 

sup r( £ \f3( £ \p)<0(e 2 ) + o{e 2 -v)Y^n j ? p ,{n j ,(en s j y 1 ), p'=pA2. 

We compute the above bound by splitting the sum into three pieces for 
rij E [x k ,x k +i), k = 0,1, 2, where x = 1, xi = e -V(«+i/2) , x 2 = £ - 2 /(2s-i/p') 
and X3 = 00. This yields by (3.4) 

5Z n iV( n i'( en i) _1 ) 

i 

<E%' + E n j (en° +1/2 r P ' /(p ' +1) + E %KT P ' 

j<xi xi<nj<X2 X2<nj 

< o(e-"){xi + X! + x 2 (ex|)-^} = 0(5"^), 

where v = max{l/(s + 1/2), l/(2s — 1/p')} = l/(ao + 1/2). Thus the regret 
is uniformly bounded by o(l)e 2 ~ v ~ 2r) . This completes the proof, since r\ is 
arbitrary and 2 — v = 2a /("o + 1/2)- □ 

Lemma A. 1 . Let (x) = {d/dx) m h(x) . Let (p n be given by (6.5) with 
a = a n > y / 21ogn. For p>2, there exist universal constants M p < 00 such 
that 

1 \ n mp+p/2 r / \p/2-l 

E\^\x) - ^\x)\ p dx < M p a —^[l + ( ° ) 
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Proof. Let M p denote any positive universal constant. We shall omit 
the calculation involving the bias b n (x) = Eip n (x) — (pa n (x), since it is of 
smaller order in the sense that 

ll^ m) (*)lloc<-/ u m e- u / 2 du<0(l)- 



n 

by (6.5) and the Fourier inversion formula, and by the Plancherel identity 



I&^ m) (^)| 2 ^ = - / u 2m e- u2 du < 0(1 



7T J a U 2 



Let Wfc(x) = a m+l K^ m \a{x - X k )) and h p (x) = J2k=i E \ w k(x)\ p /n. Since 

l fin n \x) is the average of Wk(x) and {Wk(x),k < n} are independent given 
{9 k ,k< n}, 

E\0W(x) - E^\x)f < ^hf(x) + ^h p (x). 
This implies the conclusion, since ||fo p (x)||oo + / h p (x) dx = 0(a mp+p ~ 1 ) via 
h p (x)= J \a m+l K( m \a(x-u))\ p <p Gn (u)du 

= a mp+p ~ 1 J \K^{u)\ p <p Gn {x-u/a)du. u 

Proof of Theorem 6.1. The difference between the proof here and 
that of Zhang (1997) is the use of the improved bounds in Lemma A.l. We 
shall only describe the differences and refer to Zhang (1997) for the rest. Let 
a = a n = y/2 logra. 

The condition p~ 1 a/^/n = o(l) of Lemma 1 of Zhang (1997) can be 
weakened to p~ 1 ^a/n = o(l), since by Theorem 2 of Zhang (1997) and 
Lemma A.l 

J u max{ip n {x),p) 

<Ej {^\x)-^ G n \x)} 2 {l + \^ n {x)-^ Gn {x)\/p}dx 



^ mil "(m) ( m )l|2 i -1 / eMI ~(™) (ttt.) i i 4 t-h 1 1 - 1 1 2 

<E\\<Pn -<PG h+P \ E \\<Pn -<PG U E Wn-VG n \\ 2 

„ (l + o(l))a 2m+1 Ma 2m+3 / 2 
~~ (2m + l)7rn pn?l 2 

The assumption p" l a/y/n is used in the proof of Theorem 3 of Zhang 
(1997) only for the application of Lemma 1 there. The assumption a = 
O(^f\ogn) is actually not used in the proof of Theorem 3 of Zhang (1997). 
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Hence, Theorem 3 of Zhang (1997) holds under weaker conditions a > ylogn 
and p~ 1 y / a/n = o(l). The proof in Section 5 of Zhang (1997) is based on 
Theorem 3 of Zhang (1997) and the additional conditions a > y/2 logra and 
A*(n,p) = 0(1) only, since a 3//2 /(pn) = o(l)a 3 / (pn) . □ 

Proof of Theorem 6.2. Part (i) follows directly from Remark 2 below 
Theorem 6.1 and the fact that A*(n,p n ) — > in Theorem 6.1. For part (ii), 
we shall first prove (6.11). Let (fj{-) = tp(-;Hj) and ipc = tp(-;G) be as in (2.5) 
and Wj = Wjipj/ipc- Since J2j % = 1; by Cauchy-Schwarz 

vJ vo= [£° i n) vo -^ w 'U ^^rU w - 

This and (6.7) imply (6.11), since 1 — ipc/i^G V p) is decreasing in <pc and 
ipc > Wj^j- Let A be a union of m disjoint intervals of length < 1. 
A distribution G can be written as G = Y^jLoWjHj, where wq = G(A C ), 
Wj = G(Ij) and Hj are the conditional distributions given 9 G Ij under G. 
Define r](p) = sup{A(p, H) : H ([0,1]) = 1}. Since A(p,Hj) < 1, (6.11) implies 

m 

A(p, 0) < G(^ c ) + E^ J {pM>i/m} + »7(VM) 

(A.l) 

<G(A c ) + Mmp + rj(l/M). 

It follows from Proposition 2 of Zhang (1997) that r/(p) — ► as p — > 0. 

Now, let A n = Uj=i [ c n ,j — Af, c n j + M] . The condition of part (ii) implies 
G n {A c n ) < w nfi + sup{H([-M, Mf) : H G G} -> for large n and M. Thus, 
we may assume G n (A^ l ) — > for certain A n = U^i ^n,j with disjoint intervals 
{In,j,j < ^n} of at most unit length and (possibly different) m n = o(l/p n ). 
Under this assumption and conditionally on 9( n ), Theorem 6.1 and (A.l) 
imply 

1 n 

- £ E{9 k - 9 k ) 2 < ER*{G {n) ) + EG {n) {A c n ) + Mm nPn + r?(l/M) + o(l) 
n k=i 

<ER*{G n ) + o(l) 

with G( n )(x) = ^~ 1 X)fc=i^{^fe — x }j as n — > oo and then M — > oo, since 
ER*(G {n) ) < R*(EG {n) )=R*(G n ) due to the concavity of R*(G) in G and 

£G (n) «) = G n (^)->0. □ 

Proof of Lemma 6.1. Let x be fixed. Let Hi and H2 be the conditional 
distributions given \9\ > x and \9\ < x, respectively, under G. Let w\ = G(x), 
W2 = 1 — w\ and (fj(-) = (p(-;Hj) be as in (2.5). Since H2([—x,x]) = 1, by 
the unimodality of <p, (f2 is monotone in both (— 00, — x) and (x,oo). By 
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Lemma 2 of Zhang (1997), 1 9^2 / ^2 1 < L(y>2)- This and the monotonicity of 
if 2 imply 

/ip' 2 \ 2 f°° ~ 

/ {<P2<p/»2} ^2< j(_ I {(p2<p/W2} L(99 2 )|^2| 

JO ^2 JO 



and a similar inequality for f_?L- These and Cauchy-Schwarz imply 

t/ u |>/^<^>(^) V2du< j*L{u)du<(p j^L\u)du^ ' 

(A.2) 

= pVL 2 (p)+2. 



For (6.13), we find again by Lemma 2 of Zhang (1997) that 

_.,^) 2 , 

'|«|<a: \V>2/ J\u\<x 

z 2, 



W 2 f I{<p 2 <p/w 2 } (—} <P2 du < W 2 [ I {'P2<P/W2} L ' 2 ''M<P2 du 
J\u\<x \f2/ J\u\<x 



< 2xmax{L 2 (p),2}p 

due to the monotonicity of umax{L 2 (u), 2} and L 2 {u). Thus, (6.13) holds, 
since by (6.11) 



A(p,G) <Wi+W2A(p/w2,H 2 ) <Wi+W2 / 

J (£ 



ip 2 <p/w 2 W2J 



<P2- 



For (6.14), {1 - <p 2 /(ip2 V (p/w 2 ))} <{1- p/{p V (p/w 2 ))} = w x for |u| < x, 
since <p2(u) > ip(2x) = p. Thus, (6.14) follows from (A.2) and 

w 2 — ) ¥2 1 777—7 — \ <™1™2- n 

Proof of Theorem 6.3. Since the right-hand sides of (6.16) and (6.17) 
are both concave in G, it suffices to apply Theorem 6.1 conditionally on d( n y 
By (6.8) and simple calculation, (6.15) holds, so that (6.17) follows from 
Theorem 6.1 and (6.14). For (6.16) we use the Markov inequality G{x) < 
p P 1 (G)/x p in (6.13) and then minimize p p (G)/x p + 2xp n (l + o(l))logn over 
x > 0. □ 



Proof of Lemma 6.2. By (6.1) it suffices to verify (6.19) for degener- 
ate G. Let X ~ N(p,, 1) under P^. Let h(x) = s\(x) — x = min(A, max(— A, — x)). 
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For 0\ = fj,, 

R(s x , Gi) = R s (fi; A) = E^{h(X) + X - fi} 2 = E {h(X + //) + X} 2 . 
Differentiating twice the right-hand side above with respect to fi, we find 



/ d \ 2 r r x ~^ 

yo^J Rsin; A) = 2 <p(«) du + //<p(A + //) - mp{\ - n) 



<2 



for all positive fx and A. Since R s (fi;X) is an even function, i? s (u;A) < 
R s (0] A) + fi 2 . This implies the first component of (6.19) due to 



roc 

R s (0; A) = 2 J (u- X) 2 (p(u) du 



2ip(\) / u 2 e~ Xu - u l 2 du<2X-'^{\) / u 2 e~ u du 



o 

The second component of (6.19) follows from the monotonicity of R s (n;X) 
in \fi\, proved below, as lim^-xx) # s (u; A) = A 2 + 1. By Stein's formula of 
mean-squared error, 

R B (ji; A) = E^{h 2 (X) + 1 + 2h'{X)} 

-A 



/ P^{\X\ >u} du 2 + 2P^{\X\ > A}- 1. 
Jo 



The monotonicity of R s (p;X) then follows from that of > u} in |u|. 

Inequality (6.20) is a direct consequence of (6.19). □ 

Lemma A. 2. Let be independent random variables with P{0 <Uk< 
1} = 1. Set n n = n" 1 Y2=i EU k- For all < u„ < u < 1, 

(A.3) p|n _1 ^C4>u| <e^p[-nK(u,fi n )} < exp[-2n(-u - n n ) 2 ], 

where K(pi,p2) is the Kullback-Leibler information for Bernoulli variables, 
defined by 

K{puP2) = Pi log (— ) + (l-pi)log (- — — ) 
Pi -pa np x -*p 2 - u 



(p 2 + u)(l-p 2 -u 



Proof. Let = EUk an d <5fc be Bernoulli variables with ES^ = Pk- 
Since Ell™ < Pk = E8™ for all integer m > and log(l +p/c(e A - 1)) is 
concave in p^, for A > 0, 
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(n \ n n 

a e u A < n EeXSk = n (! +^(e A - 1)) < a + ^ - 1))" 
k=l ) fe=l fc=l 

The first inequality of (A.3) follows from P{J2k u k > nu} < e~ Arm (l + /x n (e A - 
1))™ with A = log[{u(l — /x n )}/{/i n (l — u)}]. The second one follows from 
the integral formula of the Kullback-Leibler information and the bound 

(P2+«)(1-P2-u)<1/4. □ 

Proof of Lemma 6.3. By (6.21) and (6.23), Ek n = k(G n ), so that by 
Lemma A. 2, 

(A.4) P{±{k n - k{G n )) > u] < exp(-nu 2 ) Vu>0, 

with U k (or 1 — Uk) being exp(— X%/2). Since S n = I{k n < b n } are Bernoulli 
variables, 



1 n 



— E E(i n (X k ) — 9 k 

(A.5) 



= - X] - k )\l -S n ) + -^2 E(s x (X k ) - 6 k ) 2 5 n . 

n k=i n k=i 

Thus, it suffices to consider the first two cases of (6.25). 
Suppose k(G n ) >b+ = b n + y/2Qogn)/n. By (A.4) 

P{5 n = 1} < P{k„ - k{G n ) < -V2(\ogn)/n} < exp(-21og 

so that by (6.18), with xl = Hk=i( x k ~ #fc) 2 and A = V21ogn 



n) =n 2 



E5 n ]T (sx(X k ) - k ) 2 /n < E8 n {Vxlfn + Xf 

k=l 



< 



[\Ju/n + \fp n (u)du, 



where p n (u) = {u/2) n l 2 ~ 1 e~ u ' 2 /{2r(n/2)} is the density of xl and P{xl > 
Uj tn } = 1/nK By standard large deviation theory, Uj^ n = n + (2 + o(l)) x 
yj jnlogn for each j. Integration by parts yields (u/n)p n (u) du < {uj, n /n + 
l)S™ n Pn(u)du = {2 + o(l))/nK Thus, 

n 

E5 n ]T (s x (X k ) - e k f/n < n-\\ + O(l)) 2 = (2 + o(l))(logn)/n 2 . 
fc=i 

Now consider the case k{G n ) < b~ = b n — ^/3(log n)/n. By Lemma A. 2 
P{S n = 0} < - R(G n ) > V3(logn)/n} < n" 3 . 
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By (6.5) and (2.11) and a n = v / 2b^ ; \t n>p {X k ) - X k \ < a 2 J(27rp) = (logn)/(7rp), 
so that 

n 

E{1 - S n ) J2 (inA x k) - O k f/n 



< 



k=l 



"3, 



{^Ju/n + (\ogn)/('Kp)fp n (u)du 



n~ 3 ((logn)/(irp)+0(l)) 2 . 



□ 



Proof of Proposition 6.1. Part (i) follows from the proof of Theo- 
rem 6.3. For part (ii) we have 



/■logn f 

/ w'G (\/u) du + inf 
Jo x>l 



w"G (x) + x 



(logn) 3//2 



n 



> inf 

X>1 



G(x) + x 



(logn) 3 / 2 



n 



n>2. 



□ 



Proof of Theorem 6.4. By Proposition 6.1(i) it suffices to consider 
ro(n,G) in the minimum in (6.26) and independent {9k}- By Lemma 6.3 it 
suffices to bound R(s\, G n ) for R{G n ) < 6+ and 2~2k=i E(tn,p(Xk) — 6k) 2 /n — 
R*(G n ) for K,(G n ) > b~. In fact, by Lemma 6.2 we need 



(A.6) 



^ M (logn)2 ) n(G n )<b+, 



n 



and by Theorems 6.2(i) and 6.3 and the fact that k(G) = J G{yfu)du we 
need 



1 



(A.7) - V E(i n , p (Xk) - e k f - R*{G n ) < Mn(G n ) 
n ~, 



R(G n ) > b„ 



By (6.22) and the second part of (6.20), k(G n ) < 6+ = (2 + o(l))(\ogn)/y/n 
implies 

R(sx,G n ) < (21ogn + 1)k(G„) + - < M^l£, 
so that (A.6) holds. By (6.12) and (2.7) G n (l) < k(G„), so that 
- J2 E(i n jX k ) - Okf - R*(G n ) < -K{G n ) + 0(l)b~ 



n 



k=l 



by (6.17) and the fact that b n ~ (logn) / \/n. This implies (A.7), since h{G n ) < 
k(G„) by (6.22). □ 
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Proof of Proposition 6.2. Let tp k = ip(-;H k ), k = l,2, (p G = (1 
w)(pi + w<f2, and G = (1 — w)H\ + wH?. By (6.9) and algebra 

R*(G)-R*(H 1 ) 



w 



— ¥1 



^ 2 



¥>1 



vV 2 



Y2 



where u)i = (1 — w)ip\/ipG £ [0,1] and ^2 = 1 — toi = w<p2/<PG- Set g = 
log(\/2/u;). For g < 1 the right-hand side of (6.29) is greater than supg R*(G) = 1. 
Assume q > 1 . By Holder 



W k W2<PG < 



4i\ 



2q 



WkifG 



< (E\Z\ 2 i) 1 /iw 1 ~ 1 /i 



1/8 r 



W2'~PG 



1-1/5 



with Z~ iV(0, 1), since (p' k /(pk is the conditional expectation of Z given a 
random variable with density Similarly, due to J(ip' k /ipk) 2 <Pk < 1, 



(f/ 2 



W 1 W 2 <PG 



< 



W\W2^PG 



^ 2 

Y2 



-,1/2 



W(f2 



<^| Z |2 ff) l/(2 ff ) lt7 l-l/(2 ff ) < 

Thus, |i?*(G)-i?*(iJi)| <u'(l + u;- 1 /( 2< ?)||Z||2g) 2 .Let fto(g) = r(g + l/2)ei/qi. 
Since /i (g) < fto(ff+l) -> v 7 ^ 1 , \\Z\\% = T(q + 1/2)2"/^ < y/2(2q/e) q . These 
two inequalities imply 

\R*(G) - R*(Hi)\ < w{l + {V2/w) 1,{2q) V2q~/e} 2 

= w{l + \/2\og(V2/w)} 2 . 

Now we prove (6.30). Let U = 9\ — 9q and H t be the conditional dis- 
tribution of 9t = (1 — i)#o + ^1 given |J7| < 772- For fe = 0, 1, G k are mix- 
tures of H k and the conditional distributions of 9 k given \U\ > r]2, so that 
|i2*(Gjfc) — R*(H k )\ are bounded by the right-hand side of (6.29) with w = rji. 
Thus, (6.30) follows from \(d/dt)R* (H t )\ < >/8{l + l/0r}??2- By (6.9) and 
calculus, 

(d/dt)R*(H t ) = -(d/dt) J [{E*(x - § t )tp(x - 9 t )} 2 /E*ip(x - e t )\ dx 
= E*[2E* )t Z{E* jt U(l - Z 2 )} - {E^ t Z} 2 {E* )t UZ}}, 
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where E* is the conditional expectation given \U\ < 772, Z is an iV(0, 1) 
variable independent of (60, 6\) and E* t t is the conditional expectation given 
Z + 9 t and \U\ < 772. Hence, 

\(d/dt)R*(H t )\ < ??2 {2v / ^(l - Z 2 ) 2 + E,\Z\ 3 } = m V8{l + 1/v^F}. □ 

In addition to Proposition 6.2, we need the following lemma for the proof 
of Proposition 6.3. 



Lemma A. 3. Forp = 00, sup^ ( G )< C -R*(G) = lZ n (Q Ptn (C)) . For0<p< 
00, 

sup J R*(G)-^ n (9 p , n (C)) 

M P (G)<C/b 



Vpn2 
J 



< 2^ {1 + log V2/7T0 } 2 + -± ° r A ; 

exp|nA (oP7To,7ro)J 

/or all b> 1 and < 7Tq < 1, where K(pi,p2) is the Kullback-Leibler infor- 
mation in Lemma A. 2. 



Proof. Let 9 ~ G with n p {G) <C/b. Let G\ be the distribution of 9 = 

9I{\9\ < M}, where M = C/(6ttJ /p ). Since - 0| > 0} = P{\9\ > M} < 
fj,P(G)/M p < 7T , by Proposition 6.2 

(A.8) R*(G) < R*(Gx) + 2vr {l + v / 21og(V2/7r ) } 2 . 

Let v n be the prior in W 1 under which 9^ are i.i.d. variables with marginal 
distribution G\. For b > 1 and estimators 0( n ) G 0oo jn (M), 

E vn- E 8( n )\\0(n) - °(n)\\l,n 



< sup l~ E 9 (n) || V) " *(») Ill,„ = #(n) e ©oo,n(M) n G p , n (C)| 

+ 4M 2 ^{n- 1 /P||^ (n) || nip >C}. 

Taking the infimum on both sides above over #( n ) S Ooo,n(-&0, we find 
by (6.1) that 



v 



(A.9) R*(Gi) < TZ n {Q^ n (M) n 9 p , n (C)) + -^UJ £ — > C P !> » 

since all admissible estimators are almost surely in ®oo,n(M) when 0oo jn (M) 
is the parameter space. Since \9k\ p /M p < 1 are i.i.d. variables under u n with 
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EuJO k \ p /MP < 7T , by Lemma A.2 

{ n \Q \p sip \ 
^Aik > W = - I{hP7T ° < neM-nK^TTo)]. 

We complete the proof by inserting (A. 10) into (A. 9) and then inserting 
(A.9) into (A.8). □ 

PROOF of Proposition 6.3. The first inequality of (6.32) is that of 
(6.34). It follows from Lemma A. 3 that sup Mp ( G )< G R*(G) — TZ n (Qp,n(C)) is 
bounded from above by a sum of three terms: two in Lemma A. 3, and via 
the second inequality of (6.34), a third term bounded by 

(A.11) sup R*(G)- sup R*(G) < {b 2 - 1) sup R*(G). 

MG)<C H P {G)<C/b fJ. P (G)<C 

We choose b and ttq so that the three terms are of the same order. 

Let b 2 = l + 7r log + (l/7r )/sup /1 ( G )< c i2*(G). By Lemma A.3 and (A.ll), 

sup R*(G)-n n (e p>n (C)) 
ti p {G)<C 

(A.12) 

< (M + 1 7T log + 1/tto + ^ r ' J , . 

Since K(pi,p2) > (pi — P2) 2 / (2pi) for P2 <Pi < 1, for 1 < b 2 < 2 and small 
7T >0 

Ktiv \ ^ (^0 - 7T ) 2 , 2^2 1 \2 bopVglog^ (1/tto) 
2 (^ 7r o) {sup Mp(G )< G i2*(G)} 2 

where 6 = min[(&f - 1) 2 /{6V(6 2 - l) 2 } : 1 < b 2 < 2,p > 0] > 0. Thus, the 
second term in (A.12) is of the order ttq for the choice of ttq satisfying for 
certain M" < 00 

b np\$ log 2 , (1/tto) > (M"f min{l, C 2 ?' { log + (l/CP')} 2 - p '} log(C 2 /^ +2/p ), 

since sup MG) < G R*(G) < M" min{ 1 , C p ' {log + ( 1/ C p ' ) } 1 ^p'/ 2 } by (6.33). This 
holds with 

7T 3 log + (l/7T )=6(M") 2 min{l,C 2 P'{log + (l/CP')} 2 - p '}log + (C p ')/(b np 2 p f ). 

Hence, the conclusion holds, since x 3 log + (l/x) = 0(y) iff xlog + (l/x) = 
0(l)*(y) for xAy > 0. □ 



Proof of Theorem 7.2. The proof of Theorem 3.2 provides an out- 
line of the proof. The o(e _r? ) there is clearly bounded by a polynomial of 
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log i (1/e). We omit the details, since the full proof of Theorem 7.2 can be 
found in Zhang (2000). □ 

Proof of Theorem 7.3. The computation in the proof is similar to 
that in the proof of Theorem 7.2 and is provided in Zhang (2000). We 
again provide just an outline of the proof of (7.15) here. Let L(e) denote 
generic polynomials of log + (l/e). Let B = B-JC) for fixed (a,p,q,C). By 
(7.12) and (6.31), 

oo 

(A. 13) supi?( £ '*)09) -T&\B) <L(e)e 2 V 2^(2^, 2-^ Q+1 / 2 Ve), 

j=2 

where f*(n,C) = min(l, C 2p '/ 3 )/n 1 / 3 . Splitting the sum in the right-hand 

side of (A. 13) into two parts, for 2^ a+1 ^ 2 ^e > 1 and < 1, we find that the 
sum is of the order e -( 2 / 3 )/(a+V2) = £-1/(03+1/2) _ Thugi the i e f t _h an d side 
of (A.13) is bounded by L(e)e 2 - 1 /(«3+i/2) = L ( e ) £ 2a 3 /(o 3 +i/2)_ 

Now we prove (3.10) and (7.11). Let j* > satisfy 2^ a+1 / 2 ) < C/e < 
2U + 1 )( a + 1 / 2 ) an d l e t p be a probability measure under which 0j* are 
i.i.d. uniform variables in [— s,e] and (3jk = for j ^ j* . By (3.9), < 
2 J ( a+1 / 2 ~ 1 /p) e2 J ' l v <C almost surely under P, so that the minimax risk 
in Bp q (C) is no smaller than the Bayes risk under P. With the scale change 
(3 — ► /3/e, we find 

7^(5) >inf | ^pEj 3 e \f3 r:k -P r:k ) 2 ^dP = 2^e 2 R*(G ) 

> (C/e) 1/(a+1/2) e 2 R*(G )/2, 

where Go is the uniform distribution in [—1,1] and < R*(Gq) < 1 is the 
optimal Bayes risk in (6.1). This proves the lower bound in (3.10), and the 
lower bound, (7.10) and (7.15) imply (7.11). The upper bound in (3.10) 
follows from (7.10), (7.1), (7.4) and (6.33). □ 

Proof of Theorem 8.1. Define Gj(u) = 2~ j '2~2k p {0jk/£ < u}, and 
define G'j and G" in the same way for 0' and (3" . Since Gj(u) < Gj(u/2) + 
Gj(u/2), by Proposition 6.1 

r (2^,G' i ) < 2P3r pA2 (2»' J /i p (G^)) + 4ro(2»' J G;0. 

This splits the right-hand side of (8.1) into three sums. Since j2 p ((3') G 
B^(C), (7.4) holds with Gy] = G'p so that e 2 Ej V r pA2 (2 j , ^(ty) = o(l)e 2a ^ 
as in the proof of Theorem 3.2. Moreover, since J Q X G" (\/u) du < xk(P','-Je) 
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for x > 1, 

E 2^(2*, GJ) < £ 2* Iog(2*)«035j/e) 



(A/f( £ ) \ 
lA^J<0(l)(loge) 2 m( £ ) 



so that e 2 Y lj ^ j ro( 2j , G j) is also of order o(l)e 2a /( Q+1 / 2 ) . Thus, the right- 
hand side of (8.1) is uniformly o{\)e 2a ^ a+l / 2 ^ over j3 G B^ 6 ). This proves (8.3). 
It follows from (7.1) that ER^ifi) < e 2 J2j 2m*{EG ( f j \) = e 2 J2j 2^R*{Gj). 

The total ideal risk for blocks with 2? = o(l)e _1 /(° !+1 / 2 ) is o(l)e 2a ^ a+1 ^ . 
For blocks with 2^ « e -V(«+i/2) ? = (l + o(l))12*(G^) by Proposition 

6.2. For blocks with £ -1 /( a +V 2 ) = o(2 J ), the total ideal risk is smaller than 
the optimal soft thresholding risk, which is o{l)e 2a ^ a+l / 2 ^ as in the proof 
of (8.3). Thus, (8.4) holds. We omit certain details. □ 

PROOF of Theorems 4.1 and 9.1. We first prove Theorem 9.1(i). It 
follows from the proof of Lemma 7 in Brown, Cai, Low and Zhang (2002) 
that 

njvo p oi v0 r 

Efihk - fak) 2 < / (/ - /ivo) 2 %,* - 3— / (/ - f j+ i)% >k 

(A. 14) 

2i 2 J ' V0 00 2 

= 4 ^fc J 0'>o}+-^- E E^%,fc(W2^), 

£=2+1 m=l 

since (/^ + i - /^)lf )Tn = &i,m$l,m and / - /e| 2 lf, m = /? 2 m . Thus, for p' = 

2 J V1 

E ^/l/^'.fc - ^feP 

fc=l 

(A.15 

oO'voy/2 r 2^vi 00 2 e } 

<-^72- 2^Ei^r'^->o}+ e EiW • 

fc=l i=j+lm=l ) 



Since E^l I W < 2-WV0)p'(a + l/2-l/p') ( || ) g||a , )p' and < m a by (A15) 



2^ VI \ W 



^ E 14* - M P j < M ay ||/3||^V2ViV, 

so that (9.1) holds. Furthermore, (A.15) with p = 2 implies that Ef\\Uj$ 
PW2 is bounded by 

{J 2i 00 2J "I 

- /3-i,ii 2 +^/EE \hh - PjM 2 + E E#,*f 
j=0fe=l j=J+lfc=l J 
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oo /oM(J+l) o2+£ 

<E 



1=0 



N 



2 2+e \ ( 2 A 

+ —HI <J} + I{£ > J}) f E l&,m| P J 



2/p' 



<(ll/3||^) 2 



— V 2^ 1 - 2 ( a + 1 / 2 -Vp')} 

I P J+1 i A 2 -2£(a+l/2-l/p') 
^ ' £=J+1 



<M 



a ^2 



{£/W-l> + i + (£ + l> 



-2J(a+l/2-l/p') 



This implies (9.2) and completes the proof of Theorem 9. 1 (i) . 
Now we prove that for e = a/y/N and the 0v in Theorem 4.1 



sup E f \\f N - /|| 2 < (1 + ( N )Tl^(B« q (C)). 



(A.16) 

ft 

Define G'^u) = nj 1 J2 k 5 j,kI0j,k < u} and Gj(u) = 2~ j £ fc p iPj,k < u} with 
the f} j)1t in (4.7). Let y j>k = 5 jtk y jtk + (1 - <5 iijfc )iV(0,e 2 ). By Theorem 3.1 
and (3.3) 



J 

j=-l k 



^ E f E M Y. S jA t j(yj,k) -4,fe) 2 + e 2 51 E fnir p/ {n j ,n p f{G' j )) 
j=-i {tj} k j=-i 

j j 

<E f £ irfE(M%-*)-&k) 2 + e 2 E VrsWwiGj)). 
j=-i {jl fc i=-i 

Since t/^fc ~ N{j3j )k ,e 2 ) given ij, a slight modification of the proof of Theo- 
rem 8.1 implies that the right-hand side above is bounded by (l + Ov^^-^-Bp ^C)), 
in view of Theorem 9. 1 (i) . This and (9.2) imply (A.16) with the Cn in The- 
orem 4.1 for the choices of J = Jj\r in Theorem 4.1. 

It remains to show that for a + 1/2 — l/p> a/(a + 1/2) and / based on 
(4.1) 

(A.17) mf S upE f \\f-f\\ 2 >(l + o(l))K&(B« q (C)). 

Let T/v = T^JtA be the randomized mappings {Yi,ti,i < N} — > {yj,k^j,k}- 
Brown, Cai, Low and Zhang (2002) proved that due to the orthonormal- 
ity of the mappings T/v given {U}, the inverse mappings of T/v provide 
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{yj,ki$j,k} —> {Y?,ti,i < N} satisfying (4.1) with regression functions f'(t) 
such that (A. 14) holds with 0j t k = J f ' 4>jk- This yields (A. 17) by repeating 
the proof of (A. 16). □ 
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